After burning way too much money on GPT-4 API calls, I finally went all-in on local models with Ollama. The progress in 2026 is insane Qwen2.5-Coder and DeepSeek-Coder-V2 are genuinely competitive with GPT-3.5/4 on many coding tasks Here’s the short version (full write-up with commands and benchmarks on Medium): 4–6 GB VRAM → Qwen2.5-Coder:3B or DeepSeek-Coder-V2-Lite:6.7B. Good for autocomplete & small refactors. 8 GB VRAM (sweet spot) → Qwen2.5-Coder:7B. 60+ tok/s, 128k context, beats GPT-3.5 on many coding benchmarks. Just run ollama run qwen2.5-coder:7b and you’re done. 10–12 GB VRAM → DeepSeek-Coder-V2:16B. Near GPT-4 level on Python/C++. 16–20 GB VRAM → Qwen2.5-Coder:32B or Codestral-22B. Full project understanding. 24+ GB VRAM → Llama 3.3 70B (code fine‑tune). Frontier model, runs on RTX 5090 or Mac Studio. Why go local in 2026? Zero API costs Complete privacy (no code sent to cloud) Works offline 50+ tok/s on mid-range GPUs VS Code setup: Use Continue.dev with Ollama as provider. I set Qwen7B for autocomplete and DeepSeek16B for chat feels like Copilot but free and private. submitted by /u/Remarkable-Dark2840
Originally posted by u/Remarkable-Dark2840 on r/ArtificialInteligence
