Original Reddit post

I’m about to invest ~€4,000 into a fully local AI server for a company, and I’m starting to think most online recommendations completely miss the real issue. I don’t care about gaming benchmarks or raw tokens/sec. I care about reliable, production-grade AI for real business use , not a hobby setup. 🏢 Use case Small manufacturing company (3 users). The AI must: Process ~10,000 technical PDFs (standards, manuals, engineering docs) Use a ~60GB structured database (products, customers, pricing history) Work with CAD-related documentation (STEP files described via technical docs/PDFs) Generate fully correct offers automatically (technical + pricing logic) Handle marketing content and product descriptions Support product development (engineering suggestions, improvements) Run fully local (no API, no cloud allowed) 🎯 Key requirement Correctness > speed I’m fine with 1–5 minute response times if output quality is significantly better. Hallucinations in technical or pricing-related outputs are not acceptable. ⚔️** My current dilemm a I’m stuck between two approaches: Option A: DGX Spark / Unified Memory (128GB) very large models possible locally entire context stays in memory less fragmentation across documents likely slower inference Option B: RTX 5090 / CUDA server (32GB VRAM) extremely fast inference better ecosystem support (CUDA, tooling, ComfyUI, etc.) but heavily limited context size more aggressive RAG splitting required ❗ My controversial question: Isn’t it actually obvious in 2026 that for real knowledge work , memory capacity matters more than raw GPU speed? Or differently: Does a 5090-based setup even make sense if I need consistent reasoning over thousands of documents + technical CAD knowledge? ⚠️ What I’m trying to avoid**: fragmented reasoning across documents hallucinated technical or product data incorrect offer generation loss of context across engineering history unstable RAG behavior 💬 Honest question to the community: What actually matters more in production environments? larger model capacity + unified memory (DGX Spark class systems) OR faster GPUs + smaller models (RTX 4090/5090 setups) I have a feeling most people default to “more GPU = better”, but in my case that assumption might be completely wrong. Would love to hear from people who actually run serious local AI systems in companies, not just home labs. submitted by /u/No-Solution6262

Originally posted by u/No-Solution6262 on r/ArtificialInteligence