Original Reddit post

I built Khazad, a semantic cache for LLM API calls that needs zero changes to your app code. Instead of wrapping SDKs or running a proxy, it patches the httpx transport layer. After init(), it intercepts outgoing LLM requests, embeds the conversation, and serves semantically-equivalent ones from a Redis 8 Vector Set. Any httpx-based SDK works out of the box: OpenAI, Anthropic, Gemini, Azure OpenAI, Mistral. Highlights:

  • Model-aware
  • Conversation-aware
  • Streaming both ways Best for repetitive traffic like FAQ bots, RAG front-ends, and dev/CI runs. Python 3.10+, Redis 8, MIT licensed. Feedback welcome. GitHub: https://github.com/GuglielmoCerri/khazad submitted by /u/GugliC

Originally posted by u/GugliC on r/ArtificialInteligence