Just a theoretical question – say you have access to the implementation, including all the weights, of a frontier model like GPT 5.5 or Opus 4.6. So essentially you’re OpenAI or Anthropic. What would be the marginal cost and the power consumption of an (inference-only) version of those models that’s been scaled down so it can only serve a single user, but with comparable speed and intelligence as the public version? Would this calculation change much with reasoning vs. non-reasoning versions of the models? I.e. I guess my question is how much of OpenAI’s/Anthropic’s total infrastructure cost goes into creating the intelligence, and how much goes into the parallelism, i.e. the ability to serve many users simultaneously. I’m asking this because I’ve been wondering what the primary limiting resource is that prevents open models from being as good as frontier ones – is it lack of engineering, lack of training data and time, or simply lack of money for every somewhat larger organization to deploy their own instance. submitted by /u/multi_io
Originally posted by u/multi_io on r/ArtificialInteligence
