eifachposte

eifachposte

Note: I wrote the first draft of this myself. Reddit’s automoderation rejected it. So I had Kimi rewrite it—Kimi’s not great, not recommending it, but apparently humans aren’t allowed to post on Reddit anymore; everything has to pass through an LLM filter first. Here’s the sanitized version of what I actually think. Executive Summary I’ve been a heavy Anthropic user since well before this year. Opus 4.5, released last November, was a genuine breakthrough. Since then, the trajectory has been more downhill than up. But here’s the critical insight: the models themselves aren’t the problem. The bottleneck is the harness —the interface, context management, and execution layer that sits between the user and the weights. Switching harnesses has produced bigger correctness gains than switching models. Claude Code, in particular, is a bad harness. In my testing, you can move from ~50% to ~90% correctness on the same Anthropic models simply by changing the harness. This isn’t model performance; this is interface design. The Timeline: From Breakthrough to Frustration I’ve been in the ecosystem for a long time—many moons before the general hype cycle kicked into overdrive this January. Opus 4.5 in November was the real deal: coherent, capable, and reliable in ways that previous iterations hadn’t been. Then the bloat started. Not in the weights, but in the wrapper. The post-January experience has been a steady degradation in practical utility, not because the base model got dumber, but because the systems around it got heavier, more prescriptive, and less effective. The Harness Hypothesis There’s a dangerous misconception in AI discourse right now: that model benchmarks are user experience. They aren’t. The models—Opus, Sonnet, the entire Anthropic lineup—are genuinely excellent in isolation. The differentiator in 2026 isn’t the weights; it’s the harness. When I say “harness,” I mean the full stack: how context is managed, how tool use is orchestrated, how reasoning chains are structured, how much of the model’s actual capacity reaches the user versus getting eaten by system overhead. I switched to other models partly because I switched harnesses. The harness change made the biggest impact. The models are all roughly in the same tier now. The harness is what separates 50% correctness from 90% correctness. Claude Code: A Case Study in Bad Harness Design Claude Code is a crappy harness. Full stop. It locks up facts, introduces unnecessary abstraction layers, and wastes model capacity on telemetry and guardrails rather than user outcomes. You can verify this yourself: take the same Anthropic model, run it through Claude Code, then run it through a leaner harness. The correctness delta is massive. This is reproducible. This is a fact—check it out. The tragedy is that Anthropic has the best models in the industry and one of the worst production harnesses for serious work. The Pi Alternative Consider Pi’s harness. Clean, low-overhead, gets out of the model’s way. If Anthropic allowed Claude Code subscription tiers to run on something like Pi’s architecture—lean, minimal telemetry, maximum model exposure—two things would happen immediately: Usage would drop significantly. A better harness requires fewer turns to reach correct answers. Fewer turns = lower token burn = cheaper for subscribers, and cheaper for Anthropic to serve. Perceived model quality would jump. Same weights, better results, because the harness isn’t fighting the user. Why This Won’t Happen (The Incentive Misalignment) I doubt Anthropic will do this, and here’s why: part of the bloat in Claude Code is telemetry collection. The harness is heavy because it’s designed to extract usage data, not to optimize user outcomes. That’s a product strategy decision, not a technical necessity. Telemetry could sit behind an API anyway—there’s no engineering reason it needs to be baked into the execution layer this aggressively. But the current architecture suggests the harness is optimized for data collection first, user correctness second. The Pricing Reality Check And while we’re being honest: the Anthropic API pricing is absurd. Overpriced by a factor of 50x to 100x compared to what the economics should bear at scale. That’s not sustainable for individual power users or small teams. It pushes people toward subscriptions, which would be fine— if the subscription product weren’t handicapped by a bad harness . So the market dynamic becomes: subscriptions (with a broken harness) or competitors (with better harnesses, possibly worse models). Users are forced to choose between great models in a straitjacket, or mediocre models that can actually breathe. The Ask Opus and Sonnet are better in a proper harness. Claude Code is not that harness. Could Anthropic please just allow it? Decouple the model access from the harness. Let subscription users opt into minimal, high-performance interfaces. Appreciate that many of us are paying for the models, not the telemetry wrapper around them. The models are the product. The harness should be an option, not a cage. Bottom Line Stop bloating the harness. Stop treating the interface as a data-extraction surface. Let the models perform. The technology is there. The only thing in the way is the wrapper. submitted by /u/Maleficent-Movie-625

Originally posted by u/Maleficent-Movie-625 on r/ClaudeCode

The Harness Problem: Why Anthropic's Models Are Great, But Claude Code Is Costing Everyone

The Harness Problem: Why Anthropic's Models Are Great, But Claude Code Is Costing Everyone