I’ve been wondering about this for quite a while. The sub - and r/singularity
- seem flooded with coders excited about new models solely because they offer new coding capacities. But ML is a very specific domain. A narrow ASI focused on coding may or may not be relevant to other domains. https://the-decoder.com/ai-agent-benchmarks-obsess-over-coding-while-ignoring-92-of-the-us-labor-market-study-finds/ So when do we move beyond it? A study by Carnegie Mellon and Stanford University reveals that current AI agent benchmarks are heavily skewed toward programming tasks, while economically significant fields like management or law remain largely underrepresented. The imbalance extends to individual skills as well: benchmarks primarily evaluate information retrieval and computer-based work, while critical capabilities such as interpersonal interaction are almost entirely ignored. The researchers advocate for more realistic benchmarks that cover underrepresented domains and assess not just outcomes but also the intermediate steps agents take to reach them. submitted by /u/AngleAccomplished865
Originally posted by u/AngleAccomplished865 on r/ArtificialInteligence
You must log in or # to comment.
