Original Reddit post

After spending way too many hours testing local models (Llama 3, Mistral, Qwen, DeepSeek) on different hardware, I realised one thing: VRAM is everything. A 16GB card beats a faster 8GB card every time for LLM inference. So I put together three complete PC builds that prioritise VRAM per dollar. No fluff, just parts that actually work for local AI. Budget build – ~$899

  • GPU: RTX 4060 Ti 16GB (critical: the 16GB version, not 8GB)
  • CPU: Ryzen 5 5600X
  • RAM: 32GB DDR4
  • Runs: 7B–13B models at 30–50 tok/s, 13B–20B with Q4 quantization
  • Best for: beginners, students, Ollama on a budget Mid‑range – ~$1,599
  • GPU: RTX 4070 Super 12GB
  • CPU: Ryzen 7 7700X
  • RAM: 64GB DDR5
  • Runs: 34B models (Q4) at 20–30 tok/s, 16B models at full speed
  • Best for: developers, enthusiasts, 90% of local LLM use cases Pro build – ~$2,899
  • GPU: RTX 4090 24GB
  • CPU: Ryzen 9 7900X
  • RAM: 96GB DDR5
  • Runs: 70B models (Q4) at 15–20 tok/s, fine‑tune 7B models
  • Best for: researchers, heavy fine‑tuning, running the largest open models Why these parts?
  • VRAM > raw GPU speed (consensus in the local LLM community)
  • 32GB RAM is the new minimum (context eats memory)
  • NVIDIA + CUDA = still the least painful path (sorry AMD fans) Note : Prices have been fluctuating a lot recently. submitted by /u/Remarkable-Dark2840

Originally posted by u/Remarkable-Dark2840 on r/ArtificialInteligence