eifachposte

eifachposte

I am interesting development in open vs closed model gap and glm 4.7 released last dec with swe-bench verified 73.8% comparable to claude sonnet around 77%, gpt-5.1 around 76%. Tested it against sonnet on real coding work for 3 weeks Context: 356b parameter moe model (32b active), open source architecture, trained by zhipu ai. Benchmark claims swe-bench verified 73.8%, terminal bench 2.0 41%, multilingual swe-bench 66.7% Real world testing: backend debugging, refactoring, automation scripts Where it competed with Sonnet: multi-file refactoring tracked imports across codebase accurately. Debugging identified root causes at similar rate. Bash automation actually better than sonnet with fewer syntax errors. Iterative problem solving adjusted approach when first solution failed Where Sonnet ahead: architectural design explaining system patterns and tradeoffs. Recent tech sonnet trained on 2025 data, glm cutoff mid/late 2024. Teaching breaking down “why” versus just implementing The interesting part is that open model reaching competitive quality on specialized domain (coding) with api pricing around 1/5th of closed models. Cost barrier for ai-assisted development dropping significantly. Limitations observed: general knowledge weaker than frontier models. Explanation quality lower, better at doing than teaching. Training data recency gap 6-12 months behind Cost analysis: sonnet api around $70 monthly for my usage, glm api around $15 monthly same usage, saves around $55 monthly Broader questions is, are we seeing specialization emerge as path to competitive open models? Does training on domain-specific data like code and math let open models compete in niches? What happens when multiple specialized open models cover different domains at competitive quality? 3 weeks usage: handles 60-70% of tasks where i previously used sonnet. Saved around $45 api costs. Quality difference noticeable but not dealbreaking for implementation work Not claiming open models caught up overall but in specific domains like coding and terminal automation gap narrowing fast submitted by /u/Technical_Fee4829

Originally posted by u/Technical_Fee4829 on r/ArtificialInteligence

Open source llm (glm 4.7) matching closed models on coding benchmarks. Tested via api on real projects.

Open source llm (glm 4.7) matching closed models on coding benchmarks. Tested via api on real projects.