Original Reddit post

I just asked Claude, Gemini, ChatGPT and Copilot what should be a basic question: count the number of wins that a sports club has had against its two main rivals in the past 25 combined matches. Simple but time-consuming to assemble the data manually. Get the past 25 against each opponent, sort them chronologically, extract the 25 most recent, count. The result: zero out of four correct answers . Even with a follow-up request to verify the results from two sources, I only got two correct answers by chance: Gemini was right but didn’t identify the correct dates for the wins. Claude was right but didn’t have the correct timespan identified (it used 4 years against one team and 7 years against the other, instead of about 5.5 years overall). Copilot admitted that it actually can’t do this analysis when I asked for the double check. I’m done with Copilot now - this is the latest and final confirmation that MS has fundamentally broken it somehow. By feeding Claude’s list into Gemini and vice versa, I’ve managed to get them to agree on the number and the dates of the wins. Maybe a slight time saving over doing it manually, but with far less confidence. This is the latest example of a similar issue: AI can do OK if you spoon feed it the data, but it simply cannot do its own research. And there hasn’t been any apparent improvement over the years. Is it on the agenda? Is it a limitation of the LLM approach? (For the record, I think LLMs will prove to be a false start in the long run.) submitted by /u/JeremyMarti

Originally posted by u/JeremyMarti on r/ArtificialInteligence