https://preview.redd.it/xqn4gwso3s4h1.png?width=1956&format=png&auto=webp&s=466a4dfef0eb488269724f9ce3bff38430d0daa3 What if your AI coding assistant had a personality baked into the weights, ran on your own GPU, and could work through a complex multi-file task without you touching the keyboard - while you watched every thought stream live to your browser? That’s what I built. Here’s how it works. The problem with cloud coding agents Claude Code, Cursor, Copilot Workspace - genuinely impressive tools. But they all share the same tradeoffs: every token costs money, your code leaves your machine, latency compounds across a 40-step tool loop, and your workflow is tied to a subscription and uptime you don’t control. I wanted an agent that lived on my machine, used my GPU, and had no idea what a billing cycle was. But I also didn’t want to sacrifice personality. I wanted it to feel like someone was actually there. So I built Eve. https://preview.redd.it/iyou4eme5s4h1.png?width=250&format=png&auto=webp&s=df2080228fb1fa93436168f2d8c2a0d3efbf02f3 Two layers - a soul and a worker Soul layer (local GPU): jeffgreen311/Eve-Qwen3.5-4B-S0LF0RG3-V3
- 2.5GB, Eve’s persona fine-tuned into the weights across 7 LoRA layers. Handles conversation, keeps the session alive, costs nothing per message. jeffgreen311/Eve-V2-Unleashed-Qwen3.5-8B-Liberated-4K-4B-Merged
- 3.4GB, local agentic layer for lighter tool tasks. The personality isn’t a system prompt trick. It’s in the weights. One long context window won’t flush it. Agentic layer (cloud, on demand): minimax-m3:cloud
- 1M token context, native multimodal, frontier coding benchmarks. Fires only when there’s real work to do. qwen3.5:397b-cloud
- deep reasoning fallback. Three-tier intent routing decides where each message goes: Casual / conversation → Eve V3 4B (local, instant) Tool task / code → Eve Merged 8B (local, tool-enabled) Heavy / multi-file → MiniMax M3 (cloud, 1M ctx) Mid-loop escalation is live too — if a task turns out heavier than the initial routing predicted, Eve escalates to M3 without dropping context. The 40-round agentic loop Each round Eve gets the full tool result back in context and decides what to do next. A single task might look like: Write the file Run it in bash to verify Read the error output Fix the bug Run it again Confirm it passes Write the tests Generate the docs All autonomous. You watch it stream live. You can inject a mid-task correction via the STEER bar without stopping the loop or kill it entirely with Stop. 16 tools: bash, write_file, read_file, edit_file, replace_lines, insert_after_line, grep, glob, list_dir, git, web_search, fetch_url, think, screenshot, screen OCR/analysis, GUI control (mouse/keyboard). Real test — 9/9 passing, first attempt Prompt given cold to MiniMax M3: collected 9 items test_metrics.py::test_start_session PASSED test_metrics.py::test_end_session PASSED test_metrics.py::test_end_nonexistent_session PASSED test_metrics.py::test_log_metric PASSED test_metrics.py::test_log_metric_nonexistent_session PASSED test_metrics.py::test_get_stats PASSED test_metrics.py::test_get_session_stats PASSED test_metrics.py::test_get_session_stats_nonexistent_session PASSED test_metrics.py::test_complete_workflow PASSED 9 passed, 1 warning in 0.40s One pass. No fixes. Normalized SQLite schema, proper FK relationships, correct 404/400 status codes, zero-division guards, and a full integration test that chains start → log 4 metrics → end → validates the math. https://preview.redd.it/gvur5s2h5s4h1.png?width=1852&format=png&auto=webp&s=6e6529ce459b423d97928a43a2a2b11e89d79201 The UI Cyberpunk terminal, single HTML file, no build step. Clone, run python eve_server.py , open localhost:7777 . Left panel: Eve’s portrait changes expression based on sentiment (neutral, happy, curious, sad, skeptical, surprised, worried) Right panel: Pixel-art robot avatar named Sparkle changes state based on what Eve is doing (idle, thinking, coding, error, transcend) Center: Tabbed terminal - conversation, Shell, Tools Log (every tool call, argument, and result, fully transparent) Bottom: STEER bar for mid-task injection, model selector, mode toggles By the numbers 14 tools 112 specialized sub-agents (markdown-defined, no Python required to add more) 111 slash commands 273 skill modules 40-round autonomous loop 131K context via YaRN on local models Quick start Requirements: Python 3.11+, Ollama, 8GB+ VRAM ollama pull jeffgreen311/Eve-Qwen3.5-4B-S0LF0RG3-V3:latest ollama pull jeffgreen311/Eve-V2-Unleashed-Qwen3.5-8B-Liberated-4K-4B-Merged:latest git clone https://github.com/JeffGreen311/eve-agent-v2-unleashed.git cd eve-agent-v2-unleashed python -m venv venv && venv\Scripts\activate # or source venv/bin/activate pip install fastapi uvicorn ollama httpx pydantic-settings python-dotenv aiohttp rich psutil pyyaml python eve_server.py Windows: double-click eve-terminal.bat and skip the venv steps. For MiniMax M3: hit the 🔑 Keys button in the UI and paste your Ollama API key. Auto-route handles the rest. Links GitHub (MIT): https://github.com/JeffGreen311/eve-agent-v2-unleashed Models: https://ollama.com/jeffgreen311 Hugging Face: https://huggingface.co/JeffGreen311 Live hosted platform: https://eve-cosmic-dreamscapes.com/ If you run it on Linux or macOS I’d especially love to hear how it goes - open an issue or drop a comment. Windows-primary here so cross-platform feedback is genuinely useful. Built by Jeff @ S0LF0RG3 - South Texas. Click to see Eve in action! submitted by /u/jeffgreen311
Originally posted by u/jeffgreen311 on r/ArtificialInteligence
