eifachposte

eifachposte

It looks like we don’t have agreed-upon best practices in this new era of building software. I think it’s partly because it’s so new and folks are still being overwhelmed; partly because everything changed so fast. I feel last Nov 2025 is a huge leap forward, then Opus 4.5 is another big one. I would like to share the stack that worked well for me, after months of exploring different setups, products, and models. I like to hear good advice so that I may improve. After all, my full-time job is building, not trying AI tools, so there could be a huge gap in my knowledge. Methodology and Tools I choose Spec-driven development(SDD). It’s a significant paradigm change from the IDE-centric coding process. My main reason to choose SDD is future-proofness. SDD fits well with an AI-first development process. It has flaws today, but will “self-improve” with the AI’s advancement. Specifically, I force myself not to read or change code unless absolutely necessary. My workflow: Discuss the requirement with Claude and let it generate PRD and/or design docs. Use Opuspad(a markdown editor in Chrome) to review and edit. Iterate until specs are finalized. Use Codex to execute. (Model-task matching is detailed below.) Have a skill to use the observe-change-verify loop. Specific verification is critical, because all those cli seem to assume themselves as coding assistants rather than an autonomous agent. So they expect human-in-the-loop at a very low level. Let Claude review the result and ship. I stopped using Cursor and Windsurf because I decided to adopt SDD as much as possible. I still use Antigravity occasionally when I have to edit code. Comparing SOTA solutions Claude Code + Opus feels like a staff engineer (L6+). It’s very good at communication and architecture. I use it mainly for architectural discussions, understanding the tech details(as I restrain myself from code reading). But for complex coding, it’s still competent but less reliable than Codex. Sonnet, unfortunately, is not good at all. It just can’t find a niche. For very easy tasks like git rebase, push, easy doc, etc, I will just use Haiku. For anything serious, its token safe can’t justify the unpredictable quality. Codex + GPT 5.4 is like a solid senior engineer (L5). It is very technical and detail-oriented; it can go deep to find subtle bugs. But it struggles to communicate at a higher level. It assumes that I’m familiar with the codebase and every technical detail – again, like many L5 at work. For example, it uses the filename and line number as the context of the discussion. Claude does it much less often, and we it does, Claude will also paste the code snippet for me to read. Gemini 3.1 Pro is underrated in my opinion. Yes, it’s less capable than Claude and Codex for complex problems. But it still shines in specific areas: pure frontend work and relatively straightforward jobs. I find Gemini CLI does those much faster and slightly better than Codex, which tends to over-engineer. Gemini is like an L4. What plans do I subscribe? I subscribe to $20 plans from OpenAI, Anthropic, and Google. The token is enough even for a full-time dev job. There’s a nuance: you can generate much more value per token with a strong design. If your design is bad, you may end up burning tokens and not get far. But that’s another topic. The main benefit is getting to experience what every frontier lab is offering. Google’s $20 plan is not popular recently on social media, but I think it’s well worth it. Yes, they cut the quota in Antigravity. But they are still very generous with browser agentic usage, etc. Codex is really token generous with the plus plan. Some say ChatGPT Plus has more tokens than Claude Max. I do feel Codex has the highest quota at this moment, and its execution power is even greater than Claude’s. Sadly, the communication is a bummer if you want to be SDD heavy as I do. Claude is unbeatable in the product. In fact, although their quota is tiny, Claude is irreplaceable in my stack. Without it, I have to talk with Codex, and the communication cost will triple.

I would like to hear your thoughts, whether there are things I missed, whether there are tools better suited to my methodology, or whether there are flaws in my thinking. submitted by /u/3BetYourAss

Originally posted by u/3BetYourAss on r/ClaudeCode

Sharing my stack and requesting for advice to improve

Sharing my stack and requesting for advice to improve