Two things happened this month that I think are actually the same story told from different layers. Qualcomm bought Modular for $3.9B specifically to break CUDA lock-in. Mojo and MAX let you write inference code once and run it across Nvidia, AMD, Intel, Qualcomm silicon without per-chip rewrites. Lattner’s framing is that fragmented software doesn’t scale in a world with heterogeneous hardware. Same week, Google had to cap Meta’s Gemini usage because they didn’t have enough hardware to serve it. How I look at those is that they are different layers of the same core issue: Infrastructure is the bottleneck and it hasn’t caught up with how fast everyone wants to scale AI. CUDA lock-in is a software problem at the hardware layer, and Qualcomm’s betting big that fixing it unlocks adoption. Compute scarcity is the same story at the capacity layer because even Google’s rationing tokens now. What’s not getting talked about as much is whether this pattern repeats one layer up, at the data layer. I mean, what’s next? Even with hardware-agnostic inference, you still need to get proprietary, fragmented, multimodal data into a state a model can use. Multi-format data from different software and sources now become the same bottleneck that hardware used to be if they continue to be so fragmented (but maybe that issue is easier to solve?) Curious what people here think. submitted by /u/_tnhii
Originally posted by u/_tnhii on r/ArtificialInteligence
