I want serious technical estimates from people who understand AI scaling. This is a purely hypothetical scenario , so please don’t derail into “AGI is impossible” or philosophical debates. Assume everything below is already solved. Assumptions (IMPORTANT) Imagine we build an AI with: Perfect, fully cleaned, verified data (no noise / no misinformation) Complete human knowledge: all books all scientific papers all textbooks expert-curated knowledge from top scientists Structured + refined datasets Best possible modern architecture (transformer or beyond) Advanced reasoning methods included Tool use (search, code execution, memory systems, simulators) Unlimited compute budget Questions
- Model size (parameters) In this scenario, what is the realistic scale of the model? ~1T parameters? ~10T? ~100T? Or does parameter scaling stop mattering here?
- Data size (storage) If everything is fully refined and high-quality: How much storage would the dataset actually require? 100 TB? 1 PB? 10–50 PB? More? Also assume: deduped data compressed representations allowed no low-quality noise
- Compute requirements For training such a system: GPU/accelerator count (order of magnitude) Training time (months / years) Power requirements (rough estimate) Would this be even feasible physically?
- Key limitation question If we already assume: perfect data perfect architecture perfect reasoning methods perfect tool use then what becomes the real bottleneck? compute? memory bandwidth? algorithmic limits? energy? something else?
- Scientific discovery speed Most important question: If such a system exists, would it be able to: discover new scientific laws faster than humans? generate new technologies autonomously? replace large parts of research work? If yes: how much faster than current human science? 2×? 10×? 100×? or exponential acceleration? And what would limit that speed (experiments, compute, real-world testing, etc.)? Context I understand current models are limited by scaling laws and data quality. This question is about the upper theoretical bound if those constraints are removed. TL;DR If we had: perfect knowledge dataset best AI architecture unlimited compute what would be: model size (TB/PB/parameters)? compute scale? and scientific discovery speed multiplier? If you know papers, scaling laws, or serious estimates, please share. submitted by /u/radhe262772
Originally posted by u/radhe262772 on r/ArtificialInteligence
