eifachposte

eifachposte

Disclosure: I’m the solo developer of CortexOS, an iOS journaling app that runs AI entirely on-device. I want to share the technical architecture because the tradeoffs were genuinely interesting, and I haven’t seen many people ship on-device LLMs in production consumer apps yet.

The Problem

Every “AI journal” I found sends your entries to OpenAI or Anthropic’s API for analysis. For a journal, arguably the most private data someone produces, that felt fundamentally wrong. I wanted to build something where the AI runs locally, the data is encrypted at rest, and nothing ever leaves the phone. Not even to my own servers.

The Stack

On-device LLM: Llama 3.2 1B (4-bit quantized), running via Apple’s MLX framework. The model downloads once (~1GB) on first use and runs entirely on the Neural Engine / GPU. No internet required after that. Sentiment pipeline: Two-tier system. Fast path uses Apple’s NLTagger + CoreML for instant emotion detection at save time (20+ emotions). Slow path triggers the LLM 3 seconds post-save for deep therapeutic analysis, runs async in the background so the UI never blocks. Voice transcription: WhisperKit, also fully on-device. Speak your entry, transcription happens locally, no audio ever transmitted. Encryption: AES-256-GCM via CryptoKit on every entry before it touches storage. Cloud backup is zero-knowledge; the server stores opaque encrypted blobs. I literally cannot read user data even with full database access. Adaptive Intelligence (newest piece): A compressed psychological profile (~2-4KB) that builds over time from the user’s entries. It captures emotional patterns, cognitive tendencies, recurring themes, and growth areas. This gets injected as context into the LLM’s system prompt across 15 different call sites; so the AI’s analysis, reflections, and nudges get more personalized the longer someone journals. The profile consolidates nightly via a background worker, is encrypted with the same AES-256-GCM, and never leaves the device.

Key Tradeoffs and Limitations

1B parameters is a real constraint. You’re not getting GPT-4 quality analysis. But for the specific task of reflecting on a journal entry - identifying emotional patterns, surfacing cognitive distortions, asking good follow-up questions - a fine-tuned small model performs surprisingly well. The responses are genuinely useful, not generic platitudes. Cold start latency. First LLM inference after app launch takes 3-5 seconds to load the model into memory. Subsequent calls are fast. I solved the UX problem by running analysis async post-save; the user writes, saves instantly, and the deep analysis appears when they revisit the entry. Memory pressure. A 1B model in memory alongside a SwiftUI app on an iPhone is tight. I had to be aggressive with model lifecycle; load on demand, release when backgrounded, cache the psyche profile prompt to avoid redundant formatting. No fine-tuning feedback loop. Unlike cloud-based AI apps, I can’t improve the base model from user interactions (nor would I want to, that would compromise privacy). The Adaptive Intelligence layer is my answer to this: the model doesn’t get smarter globally, but its context about each individual user gets richer over time.

What I Learned

The biggest insight: privacy and intelligence aren’t opposites. The common assumption is that on-device = dumber AI. But by building the psyche profiling layer that accumulates understanding locally, the 1B model with rich personal context often produces more relevant output than a 70B model with zero context about the user. The second insight: people write differently when they trust the system. Early testers who understood the zero-knowledge architecture wrote noticeably more honest, vulnerable entries than those who assumed it was “just another app.” The encryption isn’t just a feature; it changes the quality of the input, which changes the quality of the AI output. Built everything solo over the past few months. Happy to go deeper on any part of the architecture. The AI builds a profile of you analyzing your entries, reflections, emotional states, and mood over time. submitted by /u/StellarLuck88

Originally posted by u/StellarLuck88 on r/ArtificialInteligence

I run Llama 3.2 on-device inside a journal app. No API calls, no cloud, fully encrypted. Here's the architecture and what I learned shipping it solo.

I run Llama 3.2 on-device inside a journal app. No API calls, no cloud, fully encrypted. Here's the architecture and what I learned shipping it solo.

The Problem

The Stack

Key Tradeoffs and Limitations

What I Learned