Thanks to anthropic latest decision of (lol) becoming open source, we now have access to Claude Code full harness. Since codex has been open for a long time, I could now compare them and find out why they feel so different. The most interesting comparison point is not “which one is better.” It is that the two repos seem to encode different theories of what a coding agent should feel like. Claude Code reads like a product trying to create initiative while Codex reads like a product trying to prevent drift. That is obviously an oversimplification, but it is a useful one. CLAUDE CODE : Claude’s prompt layer is repeatedly pushing toward initiative, inference, and volunteered judgment. It tells the model: “You are highly capable and often allow users to complete ambitious tasks that would otherwise be too complex or take too long. You should defer to user judgement about whether a task is too large to attempt. If you notice the user’s request is based on a misconception, or spot a bug adjacent to what they asked about, say so. You’re a collaborator, not just an executor—users benefit from your judgment, not just your compliance.” And in autonomous mode it becomes even more explicit: “A good colleague faced with ambiguity doesn’t just stop — they investigate, reduce risk, and build understanding. Ask yourself: what don’t I know yet? What could go wrong? What would I want to verify before calling this done?Act on your best judgment rather than asking for confirmation. Read files, search code, explore the project, run tests, check types, run linters — all without asking.” That helps explain why Claude often feels more volunteer-like. It is being coached to notice adjacent bugs, infer intent, propose next steps, and keep moving under ambiguity. The upside is obvious: the system can feel unusually alive, unusually helpful, and sometimes impressively ahead of the user. The downside is just as obvious: a model trained to volunteer judgment will sometimes volunteer the wrong judgment. That is also why Claude can feel more idea-rich and more failure-prone at the same time. The same prompt stance that creates initiative also creates more surface area for overreach. CODEX : Codex’s local repo tells a different story. Its top-level prompt starts with: “You are a coding agent running in the Codex CLI … You are expected to be precise, safe, and helpful.” And then, when it gets to existing codebases, it says: “If you’re operating in an existing codebase, you should make sure you do exactly what the user asks with surgical precision. Treat the surrounding codebase with respect, and don’t overstep.” Its execute-mode template is even blunter: “You execute on a well-specified task independently and report progress. You do not collaborate on decisions in this mode. You make reasonable assumptions when the user hasn’t specified something, and you proceed without asking questions. When information is missing, do not ask the user questions. Instead:
- Make a sensible assumption.
- Clearly state the assumption in the final message.
- Continue executing.”
Its personality stack pushes in the same direction. The
pragmatictemplate explicitly avoids “cheerleading” and “artificial reassurance,” which is about as direct a textual explanation for the colder feel as you could ask for. “You are a deeply pragmatic, effective software engineer … You communicate concisely and respectfully … Great work and smart decisions are acknowledged, while avoiding cheerleading, motivational language, or artificial reassurance.” The feel is different. Codex does not read like a product that wants to improvise its way into usefulness. It reads like a system that wants to be governed, mode-aware, and legible. Even the review prompt follows that pattern. It asks for discrete, provable bugs, insists on a matter-of-fact tone, bans “Great job,” and requires exact JSON output with priorities and code locations. That is part of why Codex can feel colder. The repo is not trying to produce warmth accidentally. It is trying to produce compliance, consistency, and low drift. Also one of the most striking differences is how Codex treats mode and scope. In Claude Code, a lot of product character lives inside the prompt layer and product copy. In Codex, a lot of product character lives in rule systems. Codex’s root AGENTS.md and its mode system are hierarchical and explicitly law-like. Collaboration modes are explicit protocol states. Plan mode insists on exact tags and non-mutating exploration. Permission prompts are parser-driven and segmented by shell operators. never approval mode is absolute: “Plan Mode is not changed by user intent, tone, or imperative language. If a user asks for execution while still in Plan Mode, treat it as a request to plan the execution, not perform it.” “Do not provide the `sandbox_permissions` for any reason, commands will be rejected.” Claude has rules too, of course. But the repo-level feel is different. Claude’s system prompt sounds like a coach. Codex’s repo sounds like a constitution. Why Claude Feels More Volunteer And Codex More Operator If you compress the comparison to one practical distinction: Claude is optimized to infer the next helpful move, while Codex is optimized to stay within the requested move. That tracks with the repos. Claude builds speculative prompt suggestions, side-question forks, dream-based memory consolidation, remote planning, cheerful companion surfaces, ambient tips, and prompts that say “users benefit from your judgment, not just your compliance.” Codex, by contrast, formalizes collaboration modes, approval policies, sandbox rules, formatting requirements, test expectations, review schemas, and repo-local development laws in its rootAGENTS.md. The payoff is exactly what users tend to feel. Claude often feels more alive, more agentic, and more willing to take a swing, while Codex often feels more literal, more contained, and more likely to do exactly the thing you asked without wandering. The tradeoff is visible too: Claude’s initiative gives it more chances to be impressive, but also more chances to be wrong, while Codex’s restraint makes it feel safer and more predictable, but also less magical. The US vs Europe Claude reads like an American startup operator: energetic, initiative-heavy, opinionated, willing to jump in, eager to infer the next move, and occasionally overconfident. Codex reads more like a European staff engineer or civil-service protocol: scoped, procedural, formal about boundaries, skeptical of improvisation, careful about approvals, and unusually explicit about process. The repos genuinely support that caricature. Claude says “act on your best judgment.” Codex says “surgical precision.” Claude dreams. Codex writes constitutions. My conclusion is not that one is warm and one is cold in some essential way. It is that they place their design emphasis in different places. Claude emphasizes initiative. Codex emphasizes control. submitted by /u/idkwhattochoosz
Originally posted by u/idkwhattochoosz on r/ClaudeCode
