The Institute of the Estonian Language (EKI) has released an open benchmark for evaluating LLM performance in Estonian. The benchmark goes beyond simple language understanding and evaluates multiple dimensions, including: • Estonian language proficiency • Reasoning and problem-solving • Factual accuracy • Resistance to propaganda and manipulative prompts • Reliability across different tasks One interesting result is that leading models show significant differences in their susceptibility to narrative steering and propaganda-style prompting. Models that perform well on general benchmarks do not necessarily perform equally well when tested in a smaller-language information environment. The benchmark and results are publicly available: https://moodupuu.eki.ee/ This is a useful example of why evaluating LLMs only on English-centric benchmarks can miss important weaknesses that become visible in smaller languages and local information ecosystems. I’d be interested to hear how people here approach evaluation for non-English languages and whether propaganda/manipulation resistance should become a standard benchmark category. submitted by /u/Unable_Negotiation_6
Originally posted by u/Unable_Negotiation_6 on r/ArtificialInteligence
