Original Reddit post

I’ve been building Kim a personal health agent meant for ongoing use. Instead of one-off queries, the goal is to let it answer questions and surface insights from a user’s health data over time (wearables, labs, symptoms, habits, etc.). Two challenges have been especially difficult: Long-term memory management: Maintaining useful context across weeks or months is hard. Simple vector retrieval starts to degrade with months of personal data. I’ve been experimenting with what to persist, how to summarize or forget older information, and how to handle conflicting signals across data sources. Even with better embeddings, retrieval quality and relevance remain inconsistent for longitudinal personal data. Reliability and hallucination: Even when grounded in the user’s actual data, the agent still hallucinates or overgeneralizes, especially when synthesizing information across multiple sources or time periods. I’ve tried different grounding techniques and structured outputs, but getting consistent reliability on messy, incomplete, or subjective personal data is still difficult. Evaluation is also tricky since there’s often no clear ground truth. Curious how others building personal or long-running agents are handling memory architectures and reducing hallucination with noisy real-world data. submitted by /u/Abject_Chocolate8834

Originally posted by u/Abject_Chocolate8834 on r/ArtificialInteligence