Original Reddit post

Was digging through dynamic-scene reconstruction stuff and ran into one service (won’t name it — not here to shill) that takes video and lets you orbit / pause / fly around the scene in real time inside a normal browser tab. No headset, no install, no plugin. Under the hood it’s 4D Gaussian Splatting — same idea as 3D-GS (millions of little oriented ellipsoids instead of meshes), but with a time axis so the splats deform per-frame. They quote roughly 12.5 MB per second of footage , which is shockingly small for volumetric. The part I can’t get a clear answer on: The slick demos seem to come from a multi-camera capture rig (dozens of synchronized cameras around the subject) shown at a broadcast trade show recently. Basically a capture stage. But a lot of the marketing reads “turn any 2D video into 4D.” Those are very different things. So: has anyone here actually fed a single handheld phone clip into a pipeline like this and gotten a usable navigable scene out? Or is single-cam input still the same hard problem it’s always been (occlusion, no parallax, monocular depth lies) and the magic only kicks in with a synchronized multi-cam rig? Also curious how it stacks up against Deformable 3D-GS / 4DGS papers from the last year — feels like the academic gap is closing fast but I haven’t seen a fair side-by-side. More interested in hands-on impressions than marketing reels. If anyone’s poked at the actual pipeline (any vendor, doesn’t matter) drop notes. submitted by /u/andrewaltair

Originally posted by u/andrewaltair on r/ArtificialInteligence