I was heavy Claude Code user since Claude 3.5. I mean every day for hours and on 200$ plan. Since Claude 4.7 things have been weird. It was simply missing everything, not listening, not understanding the requests. Today I wanted to merge 2 entities into 1 which required a tough migration. Changing server, making tests pass, changing sdk, admin, and storefronts. Claude Code failed for 2 days. It was deleting things it shouldn’t, it was stopping on it’s own and asking if it could proceed even tho I couldn’t be clearer I wanted it to do full migration. Claude even stated that it is MAD and ANNOYED which is extremely weird??? It literally was doing the work in that spirit as well, just trying to hack it at all costs. I felt I was annoying the tool which is very concerning they trained it like that. I decided to test Codex as it seemed like a great opportunity. It merged it on the backend, made tests pass ( I checked it and it didn’t delete them like Claude Code did ), then it made init scripts pass and used playwright to test admin and actual consumers. BACK-TO-BACK. The comparison can’t be put into percentages. Claude Code was around 5% there and Codex nailed it. Specifically it talked extremely clear. It followed natural logic and it was telling me things like old Claude Code. Clean way with clarity. I tried Codex in the past ( last year ) and it was complicating things, but right now Claude is doing it. It is like they switched. I was using Codex 5.5 on high 100$ plan for this task and it did 10k lines of code refactor. Just for this task that 100$ paid for itself even if I don’t use it for the rest of the month. There shouldn’t be a comparison. I believe Claude Code itself will be rewritten from scratch as they are moving backwards. A company that wanted to destroy engineers will be forced to admit their main tool didn’t do good enough job to write itself. It is the most ironic end and thus it will happen. submitted by /u/0xdjole
Originally posted by u/0xdjole on r/ClaudeCode

Been testing codex for about a month, so far it’s been pretty good but I did get the feeling some days it was very smart and others it used my limits way too fast. It’s hard to say if I didn’t use it properly or if it’s a common experience
Got to remember a few key factors. All new sessions have a random seed that affects how it picks tokens. Filling context window complicates logic. Long sessions has a potential to have contaminates in the session/context history. The amount of compute available effects performance.