I’ve been running an AI agent called Keats in production for a couple months now — not as a side experiment, as the actual operational backbone of a small business. It manages its own cron schedules, writes to its own memory between sessions, monitors its own health, and as of tonight, plans and queues its own social media content. A few things surprised me that I haven’t seen discussed much. Memory is where production agents actually break. Not reasoning, not tool use — memory. My agent would store the right fact and then fail to retrieve it at the decision point. The reasoning was correct given what it could see. It just couldn’t see the right things. I ended up building five memory layers with different retrieval weights depending on fact type. A fresh preference outranks an old one. A high-stakes decision outranks a low-stakes observation at the same similarity score. This sounds obvious but most agent memory treats every fact as equally findable, and that’s why recall degrades. Separating planning from execution cut my costs by 85%. I had seven cron jobs for social media, each spinning up a full reasoning session. Forty-two Sonnet calls a day. No shared state between any of them. I replaced all of it tonight with one planner that runs three times a day — it reads performance data, decides strategy, generates everything, and writes a timestamped action queue. A cheap model fires the queued actions every thirty minutes. Three expensive calls instead of forty-two. And because the planner reads yesterday’s results before making today’s decisions, the system actually improves over time instead of running the same blind strategy forever. The self-modification thing is real but the framing is wrong. The question isn’t “should agents edit themselves” — it’s “which edits are safe to automate.” I use four tiers. Schedule tweaks and step reordering happen without me. Changes to evaluation criteria need a documented hypothesis and a date to measure by. Changes to cognitive defaults need a sub-agent review. Changes to trust boundaries or safety rules require me personally. The core safety constraints are immutable — the agent literally cannot weaken its own guardrails. Everything else is just governance. If I were starting over I’d build memory first, add feedback loops immediately, and tier the safety model early. An agent without feedback is just an expensive script that runs the same strategy until you notice it stopped working. I wrote up some of the architecture in free guides on the Keats Library — covers memory patterns, scheduling architecture, self-modification governance, pre-mortems, and multi-model review. Happy to answer questions. submitted by /u/Ghattan
Originally posted by u/Ghattan on r/ArtificialInteligence
