eifachposte

eifachposte

Quick thing I want to talk about. This is just me thinking out loud, not a tutorial. And before anyone wonders, I’m not from a coding background. My background is psychology. But for the last few years I’ve been building with LLMs day in, day out, and there’s something I keep noticing. One quick bit of setup. This post is for anyone BUILDING with LLMs, so apps, tools, agents, pipelines. Not for people typing prompts into ChatGPT or Claude in a chat window. The whole thing I’m about to describe only makes sense when you’re the one wiring the LLM into a bigger system and the prompts are getting built automatically in the background. Now that many people are using LLMs for all kinds of things, building tools for themselves, building applications, LLMs are becoming much more integrated into the ecosystem of applications and code. LLMs living next to code is actually quite a new thing, only a few years old, so I don’t blame anyone for not having thought about this yet. But there’s a blind spot worth talking about. LLMs, due to their training, have a tendency to give certain outputs. We know this. It’s intrinsic to all models. And yet people still trust the LLM to do things it’s actually pretty bad at. A typical example. Ask an LLM to generate a random name. Ask it again. And again. Guess what would probably happen? It’s going to keep picking from the same small handful. If you’ve worked with Claude for any length of time you’ve probably noticed names like “Chen” or “Maya” coming up over and over. I’m sure most of you have seen this. Now imagine you’ve built an app and you’re trusting the LLM to generate names. You think you’re getting variety. You’re not. And it’s not just names. It’s the same thing with locations. Same thing with about a thousand other categories where you assume the LLM is being random. Ask for a random anything enough times and the favourites come out fast. The model has a centre of gravity and it falls into it. So what do you do? Well, this is where it stops being about you writing prompts and starts being about you, as the builder, using code to work against the limits the LLM has. Imagine your app sends a bigger prompt to the LLM. Not “give me a random location” as the entire prompt. A bigger task where one of the ingredients it needs is a random location. Instead of asking the LLM to come up with that location itself, you, in code, hold a huge repertoire of locations, thousands of them, and on each turn you randomly, mathematically pick one. Then in the prompt that your app builds and sends to the LLM, you have a placeholder for the location, and you inject the one you picked. The LLM never generates the location. It just receives “use this location” inside the bigger task. That’s the move. The LLM isn’t being asked to be random. Your code is being random. The LLM is just doing the part it’s actually good at, which is the surrounding language. And you can go deeper than just random. You can build an algorithm on top. Let’s say you’re on a travel platform and you’ve got user data, preferences, history, climate they like, all of that. You feed those inputs into an algorithm that, depending on the inputs, picks a different category of location, and then randomly picks from inside that category. So it’s still random, but it’s random inside parameters that actually mean something. Way more variety AND way more relevance than the LLM doing it alone, because the LLM’s repertoire is much more limited than you think, and now the variety is coming from a pool you control. But this goes even deeper, because there are actually two over-reliance traps, not one. Trap 1 is asking the LLM to do things it’s measurably worse at. Random variety, picking from large pools, anything where the model has favourites it’ll fall back to. Even when you hand it a reference list to pick from, the LLM has positional bias and tends to pick items near the top. Random code picks uniformly. So code literally produces more variety than the LLM, even when the LLM has been given the exact same list to choose from. Trap 2 is asking the LLM to do work that a tiny piece of code could do just as well. This is the one most people don’t think about. And it really bites you in production, where latency and cost are very important. Honestly, this trap is the one that clicked for me first. I was just kind of sitting there waiting for one of my pipelines to finish, a whole series of LLM calls running in sequence, and it was taking too long. And that’s when I had the realisation. Hey, a lot of the stuff happening in here doesn’t actually need to be done with an LLM at all. A few things to watch for in Trap 2. If you want output in a specific structure or format, that’s code. Don’t burn LLM tokens shaping things. Have the LLM produce the content. Let code do the shape. If you have something that the LLM is supposed to always output the same, every single time, for example a piece of fixed boilerplate, or a template section that never changes, that’s code. The big reason is that you’re paying tokens, on every single call you ever make, to regenerate output identical to the last call. That’s wasted tokens AND wasted time forever. There’s also a smaller risk worth mentioning, which is that depending on the model you’re using, the LLM might not even reproduce that “always the same” content reliably. It can condense it, drop bits, replace a chunk of fixed rules with a placeholder like “[full rules continue here].” The best models today are much better at this than they used to be, so this is more of a tail risk now than a guaranteed problem, but it’s still a reason to not lean on the LLM for it. To give you a real number from something I worked on, I had an LLM in one of my pipelines that was using about 40% of its output tokens just transcribing static text it had been given in its own input. I replaced that work with deterministic code that assembled the static text on the server side, and got a 41% drop in completion tokens and a 38% drop in latency, with the same final output, on every generation, forever. And on a bigger scale, I had a whole job in one of my pipelines that I just substituted completely. The LLM was doing five things, and when I actually looked at each one of them, every single thing was a deterministic transform. Regex. String operations. Table lookups. So I removed the LLM from that job entirely and replaced it with code. A roughly 2 second LLM call became a few milliseconds. Zero API cost on every message after that. And it goes further than format and lookups. Some of the work in my own pipelines, work you’d assume only an LLM can do, work that LOOKS like it needs the LLM to actually understand it, turned out to be a pure state machine. Numbers tracked across turns, updated by rules I wrote. The LLM’s only job is to tell the state which way to nudge each turn. Everything else, including what gets read out of those numbers and feeds into what happens next, is pure math. No LLM. Now, the obvious question. When SHOULD you keep the LLM? For me the line is simple. Creativity is something code is never going to have. And anything where I need to capture meaning, semantic context, the actual SENSE of what someone is saying, I’m not going to rely on regex and keyword matches alone for that. That’s where the LLM earns its keep. Anywhere you genuinely need linguistic judgement, the LLM stays. Everywhere else, push it down to code. And just to be very clear about this, none of what I’ve said above is an either/or thing. Regex, keywords, deterministic algorithms, and LLMs all work together, in the same pipeline. In my own systems I have all four of them sitting next to each other. And here’s where it gets really interesting. You can actually chain LLMs and code together, where each one does the part it’s good at. Imagine this. An LLM somewhere in your pipeline gives a semantic read on something, what the user actually means, the feel of what they just said. That read goes into deterministic code, which categorizes it together with other things the system already knows about. The code structures all of that into clean, organised inputs. Then those structured inputs get fed into ANOTHER LLM, and that’s the LLM that makes the final decision about whether some particular thing should happen or not, for example whether a particular piece of content gets injected into the main prompt that goes to the model that talks to the user. So the shape is LLM → code → LLM. The first LLM is doing the semantic understanding. The code in the middle is doing the categorizing and structuring, which is what LLMs are bad at and code is great at. The second LLM is making the judgement call with everything already cleanly laid out for it. None of those three steps could do the whole job on its own. Together they can. The way I think about it now is this. The LLM is one tool inside a pipeline, it isn’t the brain of the whole thing. But neither is the code, on its own. The brain is the architecture itself, the flow of information, the web of LLM and code intertwined, managing each other. The LLM is the part you call when you need creative variance or semantic understanding. The code is the part that structures, categorizes, organizes, and connects everything. The brain is how all of that is woven together. If you’ve read all the way to here, a toast to you. Have a good day. submitted by /u/Kai_ThoughtArchitect

Originally posted by u/Kai_ThoughtArchitect on r/ClaudeCode

Being Over-Reliant On LLMs Is The Death Of High-Quality Architecture

Being Over-Reliant On LLMs Is The Death Of High-Quality Architecture