Heya, Its amazing what agents like claude code can do interactively and I’ve benefited quite a lot from it. But, at my work, we had a long running automated pipeline of sorts and it was failing at different tasks for different inputs. And our input range was in the order of 10s of 1000s. So basically fixing classes of errors was going to take us a long amount of time. So we wanted to try and attempt to use LLMs to explore and fix error autonomously for us. The issue was we had a few constraints that made existing interactive agents not usable for eg.
- A simple API call wasn’t possible as error context varied quite a lot and we needed to let the LLM interactively explore the environment
- we needed to not let the LLM touch the core pipeline logic itself mainly because its unrelated to error and it always polluted the context
- we needed to constraint what the LLM can do during the solution exploration because some fixes which might actually fix the problem was not part of the pipeline’s responsibility at all
- Once LLM solves a class of problem, we shouldn’t use the LLM again, rather make this a deterministic fix that can simply be called from “memory” and only use LLM to solve new “unseen” problems
- We needed to do this on the fly as the pipeline was running without having to stop the pipeline, fix and re-run it Claude code could solve it, but then we had to stop the pipeline and do this interactively, and for the large class of errors, this still could cost us a lot of time. Copilot kinda has this feature to automatically summarizes the bug from the CI part of the pipeline, but it was rather post-the-fact and it couldn’t actually understand problem by probing the environment. So I came up with a rule-engine which basically stores “seen” problems as “rules” and fixes for them as “actions”. Both the rules and actions are self-contained and outside the core pipeline logic itself. If the seen problem comes up again, it simply calls the related action. If the problem is unseen, then it starts a conversation with a LLM and provides a user defined set of defined tools (like specific folders, files that can be read, written and specific commands that the LLM is allowed to run) that the LLM can use to understand and solve the issue. Once the LLM solves it, it doesnt fix the pipeline logic itself, but rather writes a rule and an action to fix that issue. So the worst the LLM can do to our pipeline is leave it as it is, This way, we were basically able to use a custom agent that is dumb and simply uses a “memory” of fixes and uses LLMs when it cant solve it just based on memory. And it can all be done by “marking” a function with a simple decorator. A simplified architecture of theow would be: https://preview.redd.it/aokinvxcqbkg1.png?width=3529&format=png&auto=webp&s=a0ca29beb91992c38d942eacd824164a19735274 This has helped fasten our process by quite a bit and and keep the agent fully autonomous but bound and to an extent even deterministic in our pipeline. And optimize for token usage as well. This is currently for any kind of process pipelines in python, and supports Anthropic, Gemini and Copilot SDKs. But I am also currently adding OpenAI support and a CLI layer on top so one can use it in any process pipelines like for example CI to make it self-healing when possible. This is probably not a very general tool that will benefit anyone like an interactive agent, but I looked around for a bit but couldnt find agentic solutions that were programmatic and leashed to a certain extent. So, its free and open source, in-case any one else might find a use for it. I am also trying to integrate it into my own projects CI pipelines for ex. to automatically fix code styles, unit tests etc. on the fly. WIP though but its a lot of fun. When interested - here is the repo: https://github.com/adhityaravi/theow Cheers! submitted by /u/4di
Originally posted by u/4di on r/ArtificialInteligence
You must log in or # to comment.
