Original Reddit post

https://preview.redd.it/aloyzoazc06h1.png?width=1200&format=png&auto=webp&s=b3252f538f773278a9b21a3ef5c83af7559701cb Google DeepMind has released a new multimodal artificial intelligence model, Gemma 4 12B. The system operates locally and offers users video, audio, and text processing without an internet connection. The model runs on standard personal computers with just 16 GB of RAM. In terms of performance, it nearly matches the twice-as-large 26B version. The new version is capable of writing code and speech recognition. In a demonstration test, it simultaneously analyzed 313 frames from a five-minute video (at a rate of one frame per second) along with the audio. Matthias Bastian, a reporter for the tech portal The Decoder , notes that this is the first mid-sized Gemma version featuring direct audio processing capabilities. The new tool is already available on the Hugging Face, Ollama, and LM Studio platforms under the Apache 2.0 license, making it easier to use commercially. Source: https://the-decoder.com/google-deepminds-gemma-4-12b-squeezes-multimodal-ai-onto-a-laptop-with-just-16-gb-of-ram/ submitted by /u/andrewaltair

Originally posted by u/andrewaltair on r/ArtificialInteligence