Open source Repo: https://github.com/abhishekgandhi-neo/llm_council This is a small framework we internally built for running multiple LLMs (local or API) on the same prompt, letting them critique each other, and producing a final structured answer. The goal is to make “LLM councils” useful for evaluation workflows , not just demos. What it supports • Parallel inference across models • Structured critique phase • Deterministic aggregation • Batch evaluation • Inspectable outputs It’s intended for evaluation and reliability experiments with OSS models. Why this matters for local models When comparing local models, raw accuracy numbers don’t always tell the full story. A critique phase can reveal reasoning errors, hallucinations, or model-specific blind spots. Useful for: • comparing local models on your dataset • testing quantization impact • RAG validation with local embeddings • model-as-judge experiments • auto-labeling datasets Supports provider-agnostic configs so you can mix local models (vLLM/Ollama/etc.) with API models if needed. Would love feedback on council strategies that work well for small models vs large models. submitted by /u/gvij
Originally posted by u/gvij on r/ArtificialInteligence
