Since I studied psychology I have access to an IQ Test. It is called IST2000R from the year 2007. It is not the most modern test anymore, but I was curious how Gemini (free version, fast model) would perform. The beauty of this test is that it measures not only one overall IQ score, which is quite worthless for real life applications, but also 9 different subscores. Those are: Complete the sentence Analogies similarities arithmetic tasks number series arithmetic symbols Figures Cube Tasks Matrices How does it work? For each subscore there is a raw score (0-20, since each subtest consists of 20 items) and a normalized “IQ value” where 100 is the average and 15 is the standard deviation. So 115 is a quite good result and due to the nature of this test usually a value around 130 is the maximum anyone can reach if you have everything right. If you need to test for a higher score, you need a specialized test. How did I do it? I have a copy of each physical page with the questions. I dragged each page into Gemini and let him answer the questions. Usually this test takes about 1-2 hours. Gemini of course just needed 5, because I dragged quite carefully. He would have been faster. I let Gemini write out each question, so I could be sure, that he read it correctly whenever it was possible. It was not possible for the Matrices, cube or Figure tasks, because those are visual problems. To the results: (X out of 20 -> normalized IQ value of X) Complete the sentence: 15 out of 20 -> 113 IQ Analogies: 17/20 -> 123 similarities: 16/20 -> 118 arithmetic tasks: 20/20 -> 131 number series: 14/20 -> 105 (here he correctly found out the pattern in almost every task but failed to simply add those numbers up. I gave him 2 chances and still he continued to make the simplest mistakes) arithmetic symbols: 20/20 -> 122 Figures: 3/20 -> 81 Cube Tasks: 7/20 -> 92 Matrices: 2/20 -> 78 Complete the sentence, Analogies and similarities can be combined to the “Verbal”-Score. Gemini reached 48 points which translates to 120 standardized IQ points arithmetic tasks, number series and arithmetic symbols can be combined to the “Numerical”-Score. Gemini reached 54 points which translates to 121 standardized IQ points Figures, Cube Tasks and Matrices are “Visual” Tasks. The raw score is 12 out of 60 which translates to 78 IQ points. These are pictures that have to be mentally manipulated and obviously this is the absolute weakest point of an LLM. It might be able to create pictures, but it does not understand what is really going on in a picture at all. Here it performed worse than had Gemini just guessed This results in a total raw score of 114 and a total IQ Score of 107. With 107 Gemini is slightly above average, but only because it has no chance of interpreting those graphics. But in these tasks I also asked him, how confident he is in his answers and it always said 90% or higher. If Gemini had also scored around 50 points in the visual tasks like in verbal and numerical, the overall IQ would have been around 125-130, almost as high as the test goes. What do you think? Are you surprised by any of this? submitted by /u/MildlyMoodyMango
Originally posted by u/MildlyMoodyMango on r/ArtificialInteligence
