eifachposte

eifachposte

I recently came across this paper written by researchers at ETH Zurich that seems to argue that extensive .MD file use on the repo-level hurts output quality and (obviously) increases token usage - not only through additional token usage by the instructions themselves, but also because they trigger deeper exploration which itself consumes extra tokens. This paper argues that LLM generated .MD files hurt the most, because they basically repeat what’s already written in the code and yield no positive effects. Human written .MD files that are kept to a minimum and only focus on things the model wouldn’t deduce from the codebase itself seem at least to yield a minimal positive impact - but only for smaller models. Here’s the second half of the abstract: Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%. Behaviorally, both LLM-generated and developer-provided context files encourage broader exploration (e.g., more thorough testing and file traversal), and coding agents tend to respect their instructions. Ultimately, we conclude that unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements. Posting this for awareness and to hear what others think of this. Personally, I use context files to control the order of operations and specifiy tool-use where it needs specification. I do have a LESSONS_LEARNED.md file which I am thinking of removing because once the lesson is codified in the codebase itself, it seems it’d be redundant anyway. submitted by /u/wifestalksthisuser

Originally posted by u/wifestalksthisuser on r/ClaudeCode

Research seems to show that repo-level .MD files reduce quality and increase cost

Research seems to show that repo-level .MD files reduce quality and increase cost