Title: Looking for Advice About Local Coding Models on a CPU Server Just joined this community and wanted to ask people with actual experience running local models for real development work. Also sorry if my English is not perfect, I’m from Turkey. I work in a finance company and most of my time goes into building and maintaining internal systems and software we also provide to clients. Mostly CRM systems, backend services, automations, integrations, telephony infrastructure and internal tools. I should probably clarify that I’m not coming from a vibe coding background. I’ve been involved with software, electronics and infrastructure related work for close to 10 years, so I already understand the systems and codebases I work on. AI mostly became a productivity tool for me for refactoring, repetitive implementation work, codebase navigation, UI improvements and speeding up ongoing projects. Until recently I was mostly using Claude for development assistance and multitasking, but lately I started getting more interested in running models locally. Right now I have access to a dual Xeon E5-2650 v2 server with 256GB RAM and 2TB SSD storage. No GPU currently. I’m not planning to host models for users or build an AI SaaS around this. I mainly want a strong local coding and reasoning assistant for existing projects and large codebases. At the moment I’m mostly looking into Qwen coder models, DeepSeek coder models and Ollama/llama.cpp setups. For people already doing production development locally, what has realistically worked best for you on CPU-heavy systems like this? Do you find smaller faster coding models more practical or are larger reasoning-focused models worth the slowdown? Also curious how local coding models currently compare to Claude for existing-project development rather than greenfield generation. submitted by /u/Professional-Maize31
Originally posted by u/Professional-Maize31 on r/ClaudeCode
