Original Reddit post

Jensen just blessed us, folks. NVIDIA just announced a “desktop” supercomputer for Windows that can natively run a 1-Trillion parameter AI. They say it’s for “enterprise data scientists,” but we all know what this is actually for: running uncensored Waifu chatbots at 500 tokens per second. Here is the TL;DR of the hardware specs: VRAM: Enough to make a grown man cry (and finally stop daisy-chaining used Tesla P40s with zip-ties). Cooling: Liquid-cooled. Doubles as a space heater. It will completely solve the winter heating bill for your entire neighborhood. Power: Requires a direct line to your local nuclear power plant. Price: Just your soul, your house, and a 50-year enterprise mortgage. 🦙 The Real Question: Running LLaMA-Behemoth We all know Meta is going to drop LLaMA-Behemoth-1T-Instruct any day now. But let’s be real about how this sub is actually going to handle it. Even with a multi-hundred-thousand-dollar DGX workstation on our desks, we are still going to aggressively quantize it because we refuse to close our 400 Chrome tabs while inferencing. The r/LocalLLaMA Quantization Roadmap for LLaMA-Behemoth-1T: I’m already selling my kidneys to afford the down payment on this DGX Station. Can’t wait to run the 1-bit quantization of Behemoth so it can confidently explain to me why 2+2=5 in 40 different languages simultaneously. Who else is pre-ordering? submitted by /u/Winter_Engineer2163

Originally posted by u/Winter_Engineer2163 on r/ArtificialInteligence