Original Reddit post

I wanted to share a desktop tool I’ve been developing called AI Media Core. Processing gif ihl37qx10q3h1… As a creator, I got completely sick of dealing with camera-generated names like DJI_0021.MOV or DSCN1234.mov . While there are a few AI renamers out there, almost all of them are limited to basic JPEGs and rely purely on cloud API calls, which gets incredibly expensive and slow when you are dumping a 64GB SD card full of high-bitrate raw video files. I built this to natively bridge the gap between heavy media workflows and multimodal AI models directly on the Mac. The Technical Setup & Multi-Model Matrix: To make the tool resilient and cost-effective, I structured it around a 4-tier processing pipeline: Local AI (Fully Offline): The app handles a localized 4.4 GB vision model. Once the framework is downloaded, it executes entirely on-device. It is fully Metal-accelerated, meaning it utilizes the unified memory on Apple Silicon (M1-M4) to run fast vision inference without sending private media assets to an external server. Smart AI (GPT-4o API integration): For highly complex scenes, low-light footage, or nuanced details, users can supply their own OpenAI API key to run deep cloud vision parsing (averaging roughly $0.001 per file). Smart AI Mini/Turbo: A faster, lightweight tier optimized for large-scale multi-gigabyte batch sessions where throughput speed is more critical than absolute semantic depth. Heuristic Fallback: If API limits hit or local constraints occur, it falls back to parsing raw EXIF coordinate data combined with direct frame color profile analysis so the process never fails silently. Format Challenges & Architecture: One of the biggest hurdles was ensuring it didn’t just read basic web formats. The app supports 20 image formats and 20 video formats out of the box—including professional camera RAW files, HEIC, AVIF, MKV, and MTS streams. The core engine translates visual analysis into a structured 5-7 word lowercase, underscore-separated title schema, automatically parsing EXIF GPS data to append geographic locations (e.g., turning a random filename into sunset_beach_at_bali.jpg ). It also includes structural guardrails like automated garbage output rejection (re-rolling if a model returns a generic phrase like “this is a video of a person” ) and deep ledger mapping ( WAS -> NOW histories) to allow multi-level batch undo rollbacks without risking file corruption. Disclosure: I am the sole creator/developer of this project. The app is built as a native standalone .dmg for macOS 11.0+. If you manage high volumes of raw drone, timeline, or camera media, I’d love to hear your thoughts on the local inference approach or features you think are missing! Links: Product Page : ai-media-core submitted by /u/That-Hour-2945

Originally posted by u/That-Hour-2945 on r/ArtificialInteligence