Original Reddit post

I want serious technical estimates from people who understand AI scaling. This is a purely hypothetical scenario , so please don’t derail into “AGI is impossible” or philosophical debates. Assume everything below is already solved. Assumptions (IMPORTANT) Imagine we build an AI with: Perfect, fully cleaned, verified data (no noise / no misinformation) Complete human knowledge: all books all scientific papers all textbooks expert-curated knowledge from top scientists Structured + refined datasets Best possible modern architecture (transformer or beyond) Advanced reasoning methods included Tool use (search, code execution, memory systems, simulators) Unlimited compute budget Questions

  1. Model size (parameters) In this scenario, what is the realistic scale of the model? ~1T parameters? ~10T? ~100T? Or does parameter scaling stop mattering here?
  2. Data size (storage) If everything is fully refined and high-quality: How much storage would the dataset actually require? 100 TB? 1 PB? 10–50 PB? More? Also assume: deduped data compressed representations allowed no low-quality noise
  3. Compute requirements For training such a system: GPU/accelerator count (order of magnitude) Training time (months / years) Power requirements (rough estimate) Would this be even feasible physically?
  4. Key limitation question If we already assume: perfect data perfect architecture perfect reasoning methods perfect tool use then what becomes the real bottleneck? compute? memory bandwidth? algorithmic limits? energy? something else?
  5. Scientific discovery speed Most important question: If such a system exists, would it be able to: discover new scientific laws faster than humans? generate new technologies autonomously? replace large parts of research work? If yes: how much faster than current human science? 2×? 10×? 100×? or exponential acceleration? And what would limit that speed (experiments, compute, real-world testing, etc.)? Context I understand current models are limited by scaling laws and data quality. This question is about the upper theoretical bound if those constraints are removed. TL;DR If we had: perfect knowledge dataset best AI architecture unlimited compute what would be: model size (TB/PB/parameters)? compute scale? and scientific discovery speed multiplier? If you know papers, scaling laws, or serious estimates, please share. submitted by /u/radhe262772

Originally posted by u/radhe262772 on r/ArtificialInteligence