I’ve been using GPT models for coding tasks since 5.1 came out and copy/pasting code from Claude since Claude 1.2, and I’ve never had major issues beyond the egregious shit Anthropic/OpenAI occasionally pulls on us. Gave the new kid in town GLM 5.2 a fair shake after the inane levels of hype this model received all over Reddit to see if I could break free from the reins of Mr. Gippity. I ran a bunch of GitHub issues and UI/UX qualms through GPT 5.5 high as well as GLM 5.2 running through Neuralwatt (~200M tokens altogether), and while both models were able to solve almost all the issues, not once did GLM 5.2 do a better job. More than once I had to restart its chain of thought because it would go into a death spiral of “wait, actually…”, and its code was excruciatingly verbose and evidently could not understand the meaning of the word “terse” when I asked it to not give me a wall of text on every response. I even tweaked its parameters to no avail. What’s more, often times GLM 5.2 consumed 50-100x the amount of tokens to complete the same tasks. What is the point of a cheaper per-token model if it burns them all up the wazoo? I can see its cost-efficiency being incredibly useful for small tasks, but even there its token burn is quite outrageous even on low thinking settings. It almost feels to me like all the Chinese models are approximating the behavior of the frontier models, which is evidence of extreme levels of distillation. I’m curious if I am the only one with this disposition, or am I just crazy? submitted by /u/0xCUBE
Originally posted by u/0xCUBE on r/ClaudeCode
