Original Reddit post

I’ve been trying to figure out where the community has landed on this, because I genuinely can’t tell. A year ago, the answer seemed obvious: if you’re building anything non-trivial with LLMs, you need structured scaffolding — PRDs, memory layers, agent roles, task breakdowns. Frameworks like BMAD-METHOD, Agent OS, Superpowers, and SpecKit (and their cousins) exist precisely because raw LLMs drift, forget context, and produce spaghetti if you don’t constrain them with specs upfront. But now I look at Claude Code and Codex , for example, since they are the ones i’m using, and they feel… different? Claude Code does its own task decomposition, maintains context across files, and can self-correct mid-session without you babysitting a spec document. Codex feels similar — it reasons about the codebase, not just the prompt. So I’m genuinely asking: Do you still scaffold everything with a spec framework before touching Claude Code / Codex? Or do you drop straight into vanilla agentic mode and only reach for a framework when things break down? Or is the real answer that spec frameworks matter more now — because you’re giving these powerful agents more autonomy, so the upfront spec is the only guardrail you have? I built a mid-complexity product (a new vertical on top of an existing platform) using Claude Code and Codex with AgentOS as a seat belt. That was a year ago. I have a few projects upcoming to build and trying to decide whether investing in proper spec scaffolding is a force multiplier or just overhead that the model handles natively now. Would love to hear from people who’ve shipped something real with either approach — not theory, actual experience. I’ve been trying to figure out where the community has landed on this, because I genuinely can’t tell. A year ago, the answer seemed obvious: if you’re building anything non-trivial with LLMs, you need structured scaffolding — PRDs, memory layers, agent roles, task breakdowns. Frameworks like BMAD-METHOD , Agent OS , Superpowers , and SpecKit (and their cousins) exist precisely because raw LLMs drift, forget context, and produce spaghetti if you don’t constrain them with specs upfront. But now I look at Claude Code and Codex , for example, since they are the ones i’m using, and they feel… different? Claude Code does its own task decomposition, maintains context across files, and can self-correct mid-session without you babysitting a spec document. Codex feels similar — it reasons about the codebase, not just the prompt. So I’m genuinely asking: Do you still scaffold everything with a spec framework before touching Claude Code / Codex? Or do you drop straight into vanilla agentic mode and only reach for a framework when things break down? Or is the real answer that spec frameworks matter more now — because you’re giving these powerful agents more autonomy, so the upfront spec is the only guardrail you have? I built a mid-complexity product (a new vertical on top of an existing platform) using Claude Code and Codex with AgentOS as a seat belt. That was a year ago. I have a few projects upcoming to build and trying to decide whether investing in proper spec scaffolding is a force multiplier or just overhead that the model handles natively now. Would love to hear from people who’ve shipped something real with either approach — not theory, actual experience. Worth mentioning… I’m originally a Product Leader who vibe codes now and not a software engineer. submitted by /u/3abwahab

Originally posted by u/3abwahab on r/ClaudeCode