eifachposte

eifachposte

Hey everyone I’m just here doing some research for content I’m putting together around LLM observability. Wanted to ask people actually building with local LLMs: What are your go-to ways to understand what’s happening inside your LLM apps? I’m trying to cover things like tracing, latency, token usage, failures, and debugging multi-step or agent workflows but I want this to be grounded in real use cases, not just theory like docs. A few things I’d especially love to know: What do you check first when something breaks? Which metrics actually matter in your setup? How are you tracking token usage or cost? How do you debug failures in RAG / agents / tool calls? What do most observability tools get wrong or miss? Also one thing I’ve noticed is a lot of docs explain concepts well, but it would’ve been way more helpful to see a real project walkthrough (like “here’s how this is actually implemented end-to-end”). If you’ve felt that too, would love to hear. Goal is to make something genuinely useful for people experimenting with local LLMs, so any insights, pain points, or “wish I knew this earlier” would really help . Thanks in advance submitted by /u/niga_chan

Originally posted by u/niga_chan on r/ArtificialInteligence

Doing some research how do you track latency, tokens, and failures in LLM apps?

Doing some research how do you track latency, tokens, and failures in LLM apps?