Original Reddit post

A Stanford study (co authored by Fei Fei Li) asked LLMs to perform tasks requiring an image to solve but were not actually given the image. They were able to solve the questions at a very high rate of accuracy just by guessing the contents of the image from the prompt, even on questions from a private dataset. From the Stanford Chair of Medicine

Models performed well without, and a little better with, the images. In one case, our no-image model outperformed ALL of the current models on the chest x-ray benchmark—including the private dataset—ranking at the top of the leaderboard. Without looking at a single image. https://xcancel.com/euanashley/status/2037993596956328108 The study: https://arxiv.org/abs/2603.21687 submitted by /u/Tolopono

Originally posted by u/Tolopono on r/ArtificialInteligence