I’m the developer of this project (solo iOS engineer). I built a pipeline that takes a single user photo + a motion source (template or user-uploaded video) and generates a short dancing video. High-level approach Client (iOS / Swift) Handles input (photo + optional video), preprocessing (crop/fit), upload, and job tracking. Generation is fully async - users can close the app and get notified when it’s ready. Backend (Firebase) Firestore: job state machine (queued → running → completed/failed) Cloud Functions: enqueue jobs + trigger workers Storage: input/output assets Push notifications: notify users when generation is complete Inference (RunPod GPU workers) Custom pipeline combining: motion extraction using a SCAIL-based approach identity preservation from the input photo video generation using WAN models Why async instead of real-time Generation takes ~15–20 minutes depending on resolution and model, so I optimized for reliability and cost rather than latency. Users can leave the app and come back once notified. Benchmarks (current) ~15–20 min per generation GPU cost is the main constraint Key lessons Quality >> features. Small improvements in realism matter more than adding options Curated motion templates outperform arbitrary user videos (better pose consistency) Async UX + notifications works surprisingly well for long-running jobs Demo / more details https://www.producthunt.com/products/danceme submitted by /u/azamat_valitov
Originally posted by u/azamat_valitov on r/ArtificialInteligence
