I’ve been building my own trading bot.Not arbitrage, not HFT. Just something that actually tries to catch real moves, short and long term. The whole stack was built with CC: the agent loop, the hard-coded risk layer, the dashboard, the broker integration, the tests. All of it. And I’m learning as I go, which means I’m making mistakes with real capital. That part isn’t theoretical. When I started, I did the thing everyone’s doing right now: two agents. One for equities, one for prediction markets (Kalshi). I ended up shutting the prediction-market side down. My takeaway: LLMs reason a lot better when they have something empirical to work with. Fundamentals, earnings, real data. When the task is basically “guess this probability better than the market,” it just becomes a guessing game. The model wasn’t adding edge. It was adding variance. I’m also not trying to build some giant alternative-data feed engine to manufacture an edge. That’s a real strategy. It’s just a game for people with the compute and infra to win it. I wanted to see how far a lean setup with good reasoning could go. On the equities side, things started to turn. I reworked the prompts and tightened the “guardian” — a hard-coded risk layer that sits between the model and the broker and can’t be argued with. After that, the curve started behaving. Less bleed. More patience. A few green positions. I’m not going to tell you I cracked it. The sample is tiny, and a lot of the green is still unrealized. But it stopped grinding down, which after the early months felt like progress. Here’s the part I actually wanted to share. When Opus 4.8 dropped, I had Claude review the whole system. Code it wrote. I was weighing whether to add more capital. First pass, I pointed it at the obvious stuff: the trade data, the current prompts, the guardian. It was skeptical. It said the apparent edge was muddied by the prediction-market swings and the early testing noise, and there wasn’t enough clean signal to scale. Fair. Then I changed the question. I told it: don’t review the data or the current config. Review the commit history. Look at how the behavior changed over time, and tie the results to those changes. Completely different analysis. The git history lets it see what the snapshot couldn’t. The bot was never one system. It was a sequence of versions. Claude could date exactly when I changed the prompt and the guardian, then segment the performance around those commits instead of treating three months of mixed results as one blob. The “current” bot came from a recent commit, which meant that a lot of the ugly numbers belonged to a version that no longer exists. That was the unlock. When you ask Claude to judge an evolving project, what you point it at changes the answer: Point it at the current state, and you get a snapshot verdict. Point it at the commit history, and you get the story: when the behavior changed, and whether your recent results belong to the system you’re running now or to a dead one. Same model, same repo, completely different read. And the version-history one was the honest one. Still early. Still small. The bot’s whole job is to survive and keep learning, and honestly, so is mine. But if you’re building something iterative in Claude Code, especially prompts, guardrails, weights, etc, try pointing it at your commits, not just your current code. It surprised me. submitted by /u/Admirable-Long-9713
Originally posted by u/Admirable-Long-9713 on r/ClaudeCode
