Original Reddit post

The Institute of the Estonian Language (EKI) has released an open benchmark for evaluating LLM performance in Estonian. The benchmark goes beyond simple language understanding and evaluates multiple dimensions, including: • Estonian language proficiency • Reasoning and problem-solving • Factual accuracy • Resistance to propaganda and manipulative prompts • Reliability across different tasks One interesting result is that leading models show significant differences in their susceptibility to narrative steering and propaganda-style prompting. Models that perform well on general benchmarks do not necessarily perform equally well when tested in a smaller-language information environment. The benchmark and results are publicly available: https://moodupuu.eki.ee/ This is a useful example of why evaluating LLMs only on English-centric benchmarks can miss important weaknesses that become visible in smaller languages and local information ecosystems. I’d be interested to hear how people here approach evaluation for non-English languages and whether propaganda/manipulation resistance should become a standard benchmark category. submitted by /u/Unable_Negotiation_6

Originally posted by u/Unable_Negotiation_6 on r/ArtificialInteligence