eifachposte

eifachposte

We’re fooled into thinking those machines are intelligent because they can manipulate language. And we’re used to the fact that people who can manipulate language very well are implicitly smart. But we’re being fooled. Um now they they’re useful. There’s no question. They’re great tools like you know computers uh have been for the last decade five decades. But let me make an interesting historical point and this is maybe due to my age. Uh there’s been generation after generation of AI scientists since the 1950s claiming that the technique that they just discovered was going to be the ticket for human level intelligence. you see declarations of Marvin Minsky, Newan Simon, um you know, Frank Rosenblad who invented the perceptron, the first learning machine in 1950 saying like within 10 years we’ll have machines that are as smart as humans. They were all wrong. This generation with L&M is also wrong. I’ve seen three of those generation in my lifetime. Okay. Um so, you know, it’s it it’s just another example of being fooled. That’s Yan Lee Kun, the creator of convolutional neural networks. He’s been outspoken in saying that the current AI architecture is reaching its peak. He thinks that throwing more data and power at the problem isn’t going to solve it. And I think that’s what the early data is showing us. It’s called the scaling problem, and it’s a large part about how open AI is in big trouble. Now, it’s obvious that AI is disruptive and some jobs will be lost to the technology. For example, diffusion models are proficient in the visual arts as you saw earlier. But as for LLMs and the general workforce, this study indicates that job losses could be a lot less. The AI space does move fast. So, I could be wrong, but that’s how things are looking today in early 2026. To sum up the job prognosis in one line, if you’re a software engineer, set up a business that fixes vibecoded apps and you’ll make a lot of money. I think the thing is artificial intelligence really is going to transform the world like in ways we can’t even imagine. But it’s not going to do it now. Not with this technology. My favorite example of this is one trains them on the whole internet. So they get access to a lot of written rules of chess and lots of games of chess and they still make illegal moves. They never really abstract the model of how chess works. That’s just so damning. you would not be able to learn chess after seeing a million games, reading the rules in Wikipedia and chess.com. Just making it bigger is not going to solve these problem. We need to do foundational research. That’s what I was saying for the last 5 years. What is intelligence? The problem is to understand your world and um reinforcement learning is about understanding your world. Whereas large language models are about mimicking people, doing what people say you should do. They’re not about figuring out what to do. just to mimic the what people say is not really to build a model of the world at all. The truth is, while AI helps make some jobs easier, when compared to a human, it performs worse a whopping 96.25% of the time, which basically means give an AI 10 tasks and it will perform at least nine of them worse than when compared to a human. That’s at least according to a new study. It’s such an interesting finding and begs the question, why has no one systematically compared how well AI does versus a human who’s done exactly the same job? All previous benchmarks have been simulated human work, not real generalized work. The results from the team of researchers who did the study makes one think maybe the true value of consumer AI isn’t hundreds of billions of dollars, but orders of magnitudes less. I’m not saying that all AI sucks. This study is just a general reminder that AI is a time saving tool and not a replacement. Just maybe the economy is valuing it too highly when it comes to near-term capabilities. Give paid jobs already completed by real people to AI models and then see how well the results compare. Once the AI completes the tasks, humans evaluate the results. The researchers called this method the remote labor index or RLI. It’s so simple. Most of us use a computer to do modern work, right? So why not just directly compare how well AIs compete on a professional computer based job? The jobs to be completed were real ones from the freelancer site Upwork, a site where you pay remote workers to complete any given task. The jobs were varied from video creation, computer Aed design, graphic design, game development, audio work, architecture, and more. Both humans and AI were given the same brief and any attached files that were necessary for the job. For example, an Excel spreadsheet of data or instructional images. The AI models were tested on 240 jobs, each paying $630 on average. So, how did they perform? The performance was abysmal. The best AI was Claude Opus 4.5 with a 3.75% success rate when it came to producing work of an acceptable quality. You heard that right, a 96.25% failure rate was the best performer. Interestingly, Gemini was the loser with a 1.25% success rate. Now, Claude Opus 4.6 might score 5% better, but that’s still a 91% failure rate. When these scores get to 35% or 40%, then we can talk. Modern models like ChatGPT were trained on trillions of tokens (roughly the equivalent of tens of millions of books), but all of that is squeezed into a neural network with on the order of hundreds of billions of parameters. There compressing 30–40 TB of human text into 0.5–2 TB of floating point numbers. That alone mathematically guarantees loss of exact detail. When you ask a question, the model doesn’t look anything up it generates the most statistically likely word sequence based on patterns. This is why precision isn’t guaranteed. The system also has no direct grounding in reality only text correlations. Once a model like ChatGPT finishes training, all weights are fixed numbers, it cannot modify them during use, it cannot store new memories, it cannot integrate new facts, it cannot update its world model so any “learning” you see during conversation is not learning at all it’s just temporary pattern tracking inside context memory, which vanishes after the session. You can’t teach the model new facts without retraining or fine tuning, which is resource intensive (requiring massive compute). In chat learning is illusory its just conditioning the output on the provided context, which evaporates afterward. If you adjust weights to learn something new, this happens ,neurons are shared across millions of concepts, changing one weight affects many unrelated behaviours, new learning overwrites old representations, the model forgets previous skills or facts, this is called, catastrophic forgetting unlike human brains, neural networks do not naturally protect old knowledge. Why targeted learning is nearly impossible you might think Just update the weights related to that one fact, but the problem is, knowledge is distributed, not localized ,there is no single memory cell for a fact every concept is encoded across millions or billions of parameters in overlapping ways so you cannot safely isolate updates without ripple damage. Facts aren’t stored in isolated memory cells but holistically across the network. A concept like gravity might involve activations in billions of parameters, intertwined with apples, Newton, and physics equations. Targeted updates are tricky. Approaches like parameter efficient fine tuning help by only tweaking a small subset of parameters, but they don’t fully solve the isolation problem. The core problem with systems like ChatGPT is not bias, censorship, or bad intent. It is structural. ChatGPT operates on fixed hardware, fixed training data, and probabilistic pattern matching derived from the past. It does not perceive reality directly, test hypotheses against the world, or update its understanding through lived feedback. As a result, it is optimized to reproduce and refine what is already known, named, and socially legible not to recognize genuinely new or pre paradigmatic truths. When information falls outside its training distribution, ChatGPT does not reliably register it as “possibly true but unknown.” Instead, it tends to normalize it into existing frameworks, explain it away using familiar concepts, or classify it as unlikely, incoherent, or false. This happens even when the information is internally consistent or later turns out to be correct. The system substitutes pattern recognition for epistemic humility. This creates a dangerous failure mode: confidence without grounding. Rather than clearly saying “I do not have the tools to evaluate this,” ChatGPT may generate fluent explanations that sound authoritative while quietly missing the point. In doing so, it risks dismissing novel insights not because they are wrong, but because they do not resemble anything it has already seen. submitted by /u/LongjumpingTear3675

Originally posted by u/LongjumpingTear3675 on r/ArtificialInteligence

AI Hype vs Reality: Why Today’s Models Perform Worse Than Humans Most of the Time

AI Hype vs Reality: Why Today’s Models Perform Worse Than Humans Most of the Time