eifachposte

eifachposte

Anthropic dropped a post on effective harnesses for long-running agents: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents Their core point: long-horizon agents lose continuity between context windows. Each new session starts blank. They overreach, declare done prematurely, leave the repo a mess, and skip real verification. Their fix is structured handoff artifacts (progress files, feature lists) and role-specialized agents (initializer + coder). Good post. But reading it I kept thinking the framing undersells something: The environment is not a thing the harness operates on. The environment IS the harness. A harness isn’t just the orchestration code or the scaffolding scripts. It’s everything the agent reads and writes — the filesystem, repo conventions, CLAUDE.md , progress files, tool outputs, hook injections, even how your tests are named. Nature figured this out a billion years ago. It’s called Stigmergy: ants don’t pass memos shift-to-shift, they drop pheromone trails. The environment carries the state. The next ant just reads the ground. DNA is a progress file. Sleep is context compaction. Apprenticeship is an initializer agent. Specialization of cells/ants/brain-regions is a multi-agent system with a shared signaling layer. Evolution selects at handoff boundaries, not mid-life. Nature solved long-running agents by externalizing state into durable readable artifacts — exactly what Anthropic is proposing, just with 3.5B years of head start. So if the environment IS the harness, the corollary is: a messy repo is a broken harness, no matter how clever your orchestration code is. You can’t out-scaffold a codebase that doesn’t tell the next agent where it is. But that’s only half. The harness still has to be tuned on three axes:

Per model — Haiku needs tighter slices and more explicit verification than Opus. Tool-use style and caching strategy shift too.
Per task — a refactor wants an impact map and test diff. A research task wants notes and sources. A long build wants a progress file and feature list. One shape doesn’t fit all.
Per environment — cargo vs pnpm, monorepo vs single package, what “done” means here. The harness has to speak the local idiom. From what I’m thinking the right mental model should be: the environment is the substrate the harness writes onto. The harness’s job is to shape that substrate — plus the model’s prompts and tools — so a fresh agent can land cold and be productive in minutes. Treat repo state as a first-class harness output, not a side effect of the work. Curious what others have found. Are you tuning per-model? Per-task? Or just running one harness and hoping? submitted by /u/Relative_Register_79

Originally posted by u/Relative_Register_79 on r/ClaudeCode

The harness IS the environment rethinking what "effective harness" actually means after Anthropic's post

The harness IS the environment rethinking what "effective harness" actually means after Anthropic's post