Original Reddit post

For the Google Gemini 3 Hackathon, I built AI Justitia Lens — a multimodal AI agent system that cross-verifies written reports with visual and audio evidence to detect discrepancies. 🔍 Problem In high-stakes environments, reports and recorded evidence don’t always align. Manually reviewing large volumes of multimodal data is time-consuming and error-prone. 🧠 What I Built Multimodal reasoning using Gemini Agent-based pipeline for cross-verification Text–image/video consistency checks Discrepancy detection and structured output 💡 Key Learning Multimodal AI isn’t just about analyzing images or text independently — the real power lies in cross-modal reasoning and structured comparison. I’m curious — what did you build for the hackathon? Did you experiment with: Tool-calling agents? Real-time applications? Vision + reasoning workflows? RAG + multimodal systems? Would love to see different approaches people took with Gemini 3. submitted by /u/Successful_Annual626

Originally posted by u/Successful_Annual626 on r/ArtificialInteligence