I built a lightweight framework for LLMs A/B testing

www.reddit.com

I built a lightweight framework for LLMs A/B testing

www.reddit.com

eifachposteMB to AI (Reddit RSS)English · 11 days ago

Original Reddit post

Hey everyone, I’ve been building LLM-based apps recently, and I kept running into the same problem: Prompt and models changes weren’t tracked properly No clean way to compare experiment results Evaluation logic ended up scattered across the codebase Hard to reproduce past results So I built a small open-source project called Modelab for llms A/B testing very quickly. The idea is simple: Version prompt / model experiments Run structured evaluations Track performance regressions Keep experiment logic clean and modular I’m still shaping the direction, and I’d really value feedback from people building with LLMs: What’s missing from current eval workflows? What tools are you using instead? Would you prefer something event-based or decorator-based? Repo: https://github.com/elliot736/modelab Happy to hear thoughts, criticism, or ideas. submitted by /u/marro7736

Originally posted by u/marro7736 on r/ArtificialInteligence

You must log in or # to comment.

Chat