Original Reddit post

Most multi-agent Claude Code is a planner LLM, N subagents, and a prayer. The piece that’s almost always missing is the decision-making layer that picks how to think about the problem before any agent touches it. The first move on every project should be Cynefin classification. Clear, Complicated, Complex, Chaotic. The classification picks the decomposition strategy. Clear gets Functional decomposition. Complicated gets Layer-based. Complex gets Risk-based, highest uncertainty first, time-boxed probes, no end-date estimation at all. The planner doesn’t get to freestyle a task tree. The epistemic regime constrains it. Specs should be conjunctions of sub-hypotheses with explicit Bayesian priors. Joint P(all) = P(H1) × P(H2) × … × P(Hn). If the product falls below 20%, no infrastructure work happens until a Phase 0 Validate-or-Kill sprint resolves the existential risks first. Kill conditions are written into the spec during pre-mortem, before motivation can distort them. A belief tracker shows whether new evidence is actually moving probabilities or just generating activity. Sunk cost gets engineered out at the spec level. Gates aren’t checkpoints. They’re a scheduled rotation of cognitive tools, timed to where each one has leverage. At phase start: Cynefin classify (which regime are we in) Polya (what is the problem actually asking) Inversion / pre-mortem (how does this fail) Decompose (per the regime-picked strategy) Theory of Constraints (what is the bottleneck right now) Fermi (size with order-of-magnitude estimates) During the phase: Bayesian execution (update priors as evidence arrives) At gate review: 8-bias audit + Five Whys Three biases tripped triggers a mandatory pause and reframe from scratch. The Gate Decision Matrix is four-valued: Go, Iterate, Pivot, Kill. “Fix it” isn’t a legal exit. Kill is first-class, and the conditions that would justify it are pre-registered as direct output of the pre-mortem. Theory of Constraints is runtime state, not a slogan. The current bottleneck gets written to disk and updated at every gate. Non-bottleneck agents wait rather than pile up WIP. Idle is diagnostic, not failure. Most multi-agent stacks treat unassigned agents as wasted capacity and pile on. That’s the opposite of the right move. Subordinating non-bottleneck work to the actual constraint is what makes the work converge. Everything is falsifiable. Every mutating tool call writes (task_id, agent_id, diff_hash SHA256, caller_kind, outcome) to an append-only audit log. Bash and Python implementations are bit-identical so attribution can’t drift between languages. A calibration miss counter makes self-review falsifiable: executor self-reports NONE/LOW, independent adversarial reviewer says HIGH/CRITICAL, that’s a miss. Over 50% miss rate triggers a retraining advisory. Self-review without a paired second voice isn’t review. It’s hope. The system shape is the point. Building agentic systems is mostly the work of making decisions falsifiable. Classify before you plan. Hypothesize with priors. Gate with cognitive tools rotated by where they have leverage. Attribute every mutation to a hash. Let Kill be a first-class outcome of any gate. The decomposition strategy is picked by the epistemic regime, not by what the planner LLM feels like generating. If you’re running multi-agent Claude Code and can’t tell me what your routing decision was before planning began, what’s your stop condition when the joint probability of all your assumptions falls through the floor? submitted by /u/Js4days

Originally posted by u/Js4days on r/ClaudeCode