eifachposte

eifachposte

Most AI agent frameworks treat the LLM as the subject of orchestration. The model: controls loops selects tools mutates execution flow decides retries effectively owns runtime topology That’s fine for demos. It’s a disaster for: KYC/AML billing systems DevSecOps regulated infrastructure compliance-heavy environments You can’t reliably: audit it replay it bound it formally reason about it So we built a completely different runtime model: A deterministic FSM where the LLM is treated as a bounded compute unit instead of an autonomous orchestrator. Demo: [LINK] The architecture: deterministic FSM runtime constrained AST-based conditions ProjectionLayer (“evaluator blindness”) execution trace observability transition entropy monitoring governance attack injectors Key difference vs LangGraph / AutoGen style systems

The LLM never owns orchestration The runtime controls: execution graph transitions governance topology The model computes a bounded step only. System decides → LLM computes
ProjectionLayer (Evaluator Blindness) The LLM never receives full context. It only receives a sanitized target-specific projection. The model cannot see: governance metadata rollback density policy internals trace health execution anomalies This prevents: semantic contamination governance overfitting adaptive behavior under observation It behaves more like a capability-security boundary than prompt engineering.
No eval()/exec() Conditions are evaluated through a constrained AST engine. No: arbitrary Python dynamic execution method calls unrestricted expressions This intentionally limits semantic surface area. The design philosophy is closer to: Rego / OPA Terraform HCL IAM policy DSLs than AI agent frameworks.
Transition Entropy We monitor structural instability of execution semantics. Not: token counts prompt traces latency dashboards But: execution path variance transition entropy topology degradation If entropy exceeds an empirical threshold (>2.5 bits), the runtime flags unstable execution behavior.
Failure Laboratory The repo includes deliberate governance attack injectors: tool injection policy bypass step reordering corrupted receipts GDPR erase simulation The point is to test deterministic failure handling under adversarial conditions. Most demos only show happy paths. We intentionally expose failure semantics.
Transactional AI Code Mutation The development agent also follows governed execution principles. Repository mutation flow: stage_patch() → validate_staged_mypy(tmpdir) → pytest → atomic commit OR rollback The repo is never mutated before validation succeeds. This gives CI-grade mutation safety for AI-assisted development. Stack: Python 3.10+ Streamlit mypy --strict pytest deterministic FSM runtime Current status: 51/51 tests PASS 0 mypy errors Question for the community: Are autonomous agents fundamentally the wrong abstraction for production AI systems? Is “Governed Probabilistic Execution” a more viable long-term direction for enterprise AI infrastructure? submitted by /u/ale007xd

Originally posted by u/ale007xd on r/ArtificialInteligence

Why we locked an LLM inside a deterministic FSM (and built a failure laboratory around it)

Why we locked an LLM inside a deterministic FSM (and built a failure laboratory around it)