I was building a motion analysis feature for an iOS health app that watches how someone walks and flags fall risk in elderly patients. The gait analysis engine at the core of it accepted one format: H.264 video, AAC audio, a specific resolution range. Feed it anything else and it threw this and stopped: VideoConversionError: codec mismatch — expected H.264/AAC, received HEVC/AC3 Three times in one week. My Claude Code converter was supposed to handle format normalization before the video reached the engine. It didn’t, not reliably. One agent was doing everything: inspect the file, decide on parameters, run ffmpeg, validate output, structure the data for iOS. When it failed, the error could have come from any of those steps. By step 3 it was making decisions based on half-remembered output from step 1, not the actual file metadata. The fix was 4 Claude Code agents running in parallel from a single Orchestrator, wired together with the Superpowers plugin, tool execution routed through Harbor , retry persistence handled by Temporal , and per-agent run visibility from Agno . ORCHESTRATOR │ Coordinates. Delegates. Never touches files. │ ├──────────────────┬──────────────────┬────────────────── │ │ │ │ [parallel] │ [parallel] │ [parallel] ▼ ▼ ▼ Inspects file. Confirms gait Extracts target Returns JSON. engine is up. spec for ffmpeg. │ │ │ └──────────────────┴──────────────────┘ │ │ [merge + hand off] ▼ Runs ffmpeg. Returns converted path. │ ▼ u/validator PASS or FAIL. Structures iOS payload. The Orchestrator fans out to three agents simultaneously: the Format Analyst inspects the video, a pre-flight engine check confirms the gait analysis endpoint is reachable and accepting input, and a metadata prep agent pulls the target spec for ffmpeg. All three run in parallel. The Conversion Engine starts only once all three return. That fan-out cut pipeline latency on a 10-video batch by roughly 40% compared to running the same steps sequentially. The wiring The Superpowers plugin (free, 94,000+ GitHub stars, on Anthropic’s official marketplace) adds u/agent invocation to Claude Code. Install it once: claude mcp add superpowers The Orchestrator fires all three parallel agents in one instruction block: analyze /path/to/patient-walking-session.mp4 confirm gait-engine endpoint is accepting input extract target spec for H264/AAC 1280x720 Once all three respond, the Orchestrator passes only what the Conversion Engine needs: convert /path/to/input.mp4 to H264/AAC target-resolution 1280x720 The key constraint: pass only what the next agent needs. Early on I forwarded the full metadata JSON from the Format Analyst to the Conversion Engine. It started making decisions on fields it had no business reading. The Orchestrator now extracts the relevant keys from each parallel response and merges just those before passing downstream. The CLAUDE.md definitions Each agent gets a narrow CLAUDE.md . The “What You Do Not Do” block is not optional. Without it, agents fill in what they think you want. Orchestrator (key lines): You coordinate. You never write code. You never execute ffmpeg. Fan out to , , and in parallel. Wait for all three. Then call u/conversion-engine with merged output. One retry maximum on FAIL. Log the reason. Return the error to a human. Format Analyst (full output contract): Return ONLY this JSON: “codec_video”: “H.264” , “bitrate_kbps”: number | null, “duration_seconds”: number | null, “meets_spec”: boolean, “mismatch_reason”: string | null } Run: ffprobe -v quiet -print_format json -show_streams [filepath] Do not convert. Do not suggest fixes. Return the JSON and stop. The three problems the diagram doesn’t show Context bloat. The Orchestrator was accumulating tool responses from every parallel agent across a long batch session. By the tenth video it was carrying output from nine it no longer needed, and that dead weight was affecting decisions on the current one. Harbor sits between the Orchestrator and the tools as an execution layer. Each agent requests a tool call, Harbor scopes what that agent is allowed to run, executes it in a sandbox, and writes the result to a shared workspace. The Orchestrator never holds tool output in its context window — Harbor holds it. Files, artifacts, and traces persist there across the full batch run. The Format Analyst can’t accidentally touch the gait engine endpoint, and the engine-check agent can’t write to disk, because Harbor grants each one only what its scope allows. Retry persistence. When the Validator returned FAIL and triggered a retry, the Orchestrator had no durable memory of why the first attempt failed. Temporal stores workflow state between steps, so a failed conversion restarts from exactly where it stopped rather than re-running the full parallel fan-out from scratch. Run visibility. Per-agent cost across parallel runs was opaque. Agno’s built-in runtime surfaces per-agent execution metrics without a separate observability stack, which matters once you’re processing batches of 50+ videos and need to know which agent is burning tokens. The pipeline has been stable for three weeks. submitted by /u/Deep_Structure2023
Originally posted by u/Deep_Structure2023 on r/ClaudeCode
