Existing LLM monitors watch inputs. They track what users send, embedding distances, token counts, latency. They have a blind spot: silent failures. A silent failure is when your system prompt changes, your model gets swapped, or your deployment quietly degrades, but user inputs look identical. Same inputs, same embeddings, zero signal. Your monitor sees nothing. Your users notice before you do. I built Sentry to fix this. It watches what your model actually generates, not what users send. One URL change, nothing else to configure. Head-to-head test against embedding-based monitoring on identical traffic: Silent failure (system prompt changed silently, inputs identical): Sentry caught it in 2 requests. Embedding monitor took 9. Domain shift (traffic topic changed): Both caught it in 1 request. Prompt injection: Embedding monitor faster here. Both detected it. The silent failure result is the one that matters. Input monitors are blind to it by definition, same inputs means same embeddings means no signal. Sentry watches outputs so it catches what inputs can never reveal. Here is what an actual detection looks like: Status: DRIFT Type: DOMAIN_SHIFT Severity: P1 — Investigate within 30 min Started generating: ‘OAuth’, ‘webhook’, ‘payload’ Stopped generating: ‘sorry’, ‘help’, ‘I’ That is a real output from a real test. You see exactly what changed and what to do about it. Screenshot of a live detection above, real output, real API, real drift caught in 2 requests. Free to try. Source available on GitHub, free for research and non-commercial use, commercial license required for production deployments. One URL change to try it on your own setup. GitHub: https://github.com/9hannahnine/bendex-sentry Would love for people to test it and tell me what they find. ⭐ if this is useful. submitted by /u/Turbulent-Tap6723
Originally posted by u/Turbulent-Tap6723 on r/ArtificialInteligence
