I built Khazad, a semantic cache for LLM API calls that needs zero changes to your app code. Instead of wrapping SDKs or running a proxy, it patches the httpx transport layer. After init(), it intercepts outgoing LLM requests, embeds the conversation, and serves semantically-equivalent ones from a Redis 8 Vector Set. Any httpx-based SDK works out of the box: OpenAI, Anthropic, Gemini, Azure OpenAI, Mistral. Highlights:
- Model-aware
- Conversation-aware
- Streaming both ways Best for repetitive traffic like FAQ bots, RAG front-ends, and dev/CI runs. Python 3.10+, Redis 8, MIT licensed. Feedback welcome. GitHub: https://github.com/GuglielmoCerri/khazad submitted by /u/GugliC
Originally posted by u/GugliC on r/ArtificialInteligence
You must log in or # to comment.
