Original Reddit post

Even though I primarily use Frontier (Claude) models every day, I try to keep my eye on the self-hosted AI model space because I think innovation in this space has the ability to transform everyone’s use of AI, not just those who can afford a pricey subscription. That being said, I’m curious how (and how many) people are out there actually hosting and running inference on consumer hardware (I.e a Mac mini or a standard gaming PC with one graphics card). Some notes: If you have built a massive gaming rig with a bunch of high end video cards, I am not super interested in your setup. This isn’t a “post your rig” post. If you are using a mixture of local and frontier models, I am curious what tasks you use for local and what you give to the cloud, and why? My setup cost (outside of my time) less than $1100 total plus my Claude max subscription. I am curious about those that chose to spend less and to some extent those that chose to spend more. My setup Mac Mini M4 32GB memory running mlx-server and ollama (for smaller models) as my desktop. I tried using vlm-mix but it kept leaking memory and crashing. I run a custom build of aichat and llm functions on my desktop running out of a hybrid markdown context engine. Openclaw runs sometimes, and sometimes I turn it off when it gets into mischief A separate “server laptop” sitting on my desk running openwebui, neo4j, and Postgres. Web search via searxng and open terminal on this server integrated with openwebui. No open router (yet). My models Running simultaneously: Qwen3.5-35B-A3B-4bit (with tool call, reasoning, etc). Gemma3:4b Quick questions run directly to Gemma4, more in depth or coding questions go to Qwen. Really complicated things run through Claude and MCP, which integrates with local models to save tokens. Conclusion It works well for my purposes, but I am mostly curious what works for you all? This is an awesome community and would love to learn from what you have settled on for day-to-day LLM use. submitted by /u/Solid_Temporary_6440

Originally posted by u/Solid_Temporary_6440 on r/ArtificialInteligence