Original Reddit post

Hello folks, In the past month, even with the endless complaints on token burning like coal, I have started relying heavily on a hook that burns quite a few tokens at the beginning of a new implementation, but that is paying dividends in terms of code quality. First and foremost, I use a very structured approach to my code. Both API and front end use the idea of “modules” which are self-contained functionalities. Moreover I use Neo4J as DB with a custom-written way of transforming graph data in json:api} (and vice versa when I send data from the front end). This is how I like to work, and I need my code to be consistent. In the past I used a bloated CLAUDE.md which did not work well, as CC was hallucinating like crazy. I then started adding a ToC in CLAUDE.md that would redirect to various markdown documents containing the true information for how the architecture worked. This worked a little bit better, but still CC was often " forgetting to read the architecture documentation " which lead to hallucinations. In the last iteration of my research for code generation consistent with my architecture I created a small skill containing a the information on where certain information could be found, linking lots of small markdown files containing the architecture details. I then added a hook when CC wants to write or edit a file, something like this { “hooks”: { “PreToolUse”: [ { “matcher”: "EditWrite|MultiEdit", “hooks”: [ { “type”: “command”, “command”: “.claude/hooks/remind-architecture.sh”, “timeout”: 5 ] } ] } } and of course a shell script that is building the correct links (as I work in git worktrees). The results are both good and bad. Let’s start from the bad ones. Enforcing this burns more token that I would like. We are talking about 70k to 100k tokens. From the positive, though, I can tell you that very rarely the code I get is hallucination, or a code that does not follow my architecture. I use superpowers, and I generally have to remind CC to use the skill when it writes the spec and the plan. I found out that I need to avoid long debugging and " swearing sessions " because the code refuses to work or simply violates my architecture so much that it is just an unidentifiable bug waiting to crash the system in edge cases. So, from one side I burn more tokens in a time where limits seems to be lower than usual. PS: I use the same when I use GLM-5.1 and the difference with it is MASSIVE as well. So, I’ve decided to value quality over quantity, maybe hitting some limits earlier than usual, but knowing I will have to fix much less garbage and architecture violation! submitted by /u/nicoracarlo

Originally posted by u/nicoracarlo on r/ClaudeCode