Original Reddit post

I was watching a great interview with Hamel Husain & Shreya Shankar about LLM evals. They gave some advice to just spin up your own eval system tailored to your needs. But I also see some startups with output scoring and notes products that seem flexible. And some agent frameworks have built in eval systems. Which type of eval platform do you use? Custom, standalone, or part of a framework? submitted by /u/thehashimwarren

Originally posted by u/thehashimwarren on r/ArtificialInteligence