Google’s Gemini AI models have improved leaps and bounds over the past year, but you can only use Gemini on Google’s terms. The company’s open-weight Gemma models have provided more freedom, but the Gemma 3, which launched more than a year ago, is getting a bit stale. Starting today, developers can get started with Gemma 4, which comes in four sizes optimized for local use. Google has also acknowledged developers’ frustrations with AI licensing, so it’s getting rid of Gemma’s custom license.
Like previous versions of its open models, Google has designed Gemma 4 to be used on local machines. That can mean many things, of course. The two large Gemma variants, 26B Mixture of Experts and 31B Dense, are designed to run unquantized in bfloat16 format on a single 80GB Nvidia H100 GPU. Sure, this is a $20,000 AI accelerator, but it’s still local hardware. If quantized to run lower precision, these large models will fit on consumer GPUs.
Google also says it has focused on reducing latency to really take advantage of Gemma’s local processing. The 26B Mixture of Experts model activates only 3.8 billion of its 26 billion parameters in inference mode, giving it much higher tokens per second than similarly sized models. Meanwhile, 31B Dense is more about quality than speed, but Google hopes developers will fine-tune it for specific uses.
The other two Gemma 4 models, Effective 2B (E2B) and Effective 4B (E4B), are aimed at mobile devices. These options were designed to keep memory usage low during inference, running with 2 billion or 4 billion effective parameters. Google says the Pixel team worked closely with Qualcomm and MediaTek to optimize these models for devices like smartphones, Raspberry Pi, and Jetson Nano. Not only do they use less memory and battery than the Gemma 3, but Google is also touting “almost zero latency” this time around.
More powerful, more open
All new Gemma 4 models will reportedly leave Gemma 3 in the dust; Google claims these are the most capable models you can run on your local hardware. Google says Gemma 31B will debut at number three on the Arena list of top open AI models, behind GLM-5 and Kimi 2.5. However, even the largest Gemma 4 variant is a fraction of the size of those models, which in theory makes it much cheaper to run.
