Even though Google releases Gemma 4, its open large language model, it has a brief historical significance. These models have the same technical and research base as the Gemini 3 Pro models, which were released late last year, and are a big step up from the proprietary larger language models. Gemma 4 brings those improvements to the open model, with a focus on advanced logic and agentic workflow. It is natively trained in over 140 languages.

The Gemma 4, with the reference to generation four in the name, is also available in four model sizes. It has Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MOE) and 31B Dense. The 26B and 31B models are ready for frontier intelligence with offline calculations on personal systems. The E2B and E4B models with smaller parameter sizes are more suitable for smartphones, mobile devices, and Internet of Things (IoT) ecosystems as well as edge devices, including the Raspberry Pi and Nvidia’s Jetson.
If you’re wondering how Gemma differs from Gemini, the main difference is that Gemma is an AI processing engine, not a chatbot-esque implementation. While the underlying technology is largely the same, Gemma is an open model that can be downloaded and run locally for free. It also brings greater flexibility to modify, fine-tune, and customize based on specific workflow and computing requirements. The fact that these models can be run locally and not on the cloud also has data privacy and cost benefits for many applications.
Convenience and flexibility extend to personal, business and enterprise use.
Google has teamed up with partners including Qualcomm, MediaTek, and Nvidia for these models. The new models are released under the Apache 2.0 license, meant to provide developers, researchers, and commercial entities significant freedom to use, modify, fix, and redistribute the models with minimal restrictions. Previously, that flexibility was comparatively limited.
“Purpose-built for advanced reasoning and agentic workflows, Gemma 4 delivers an unprecedented level of per-parameter intelligence. This success is based on incredible community momentum: since the launch of our first generation, developers have downloaded Gemma more than 400 million times, creating a vibrant Gemmaverse of more than 100,000 variants,” says Clément Farabet, vice president of research at Google DeepMind.
Arena AI, a public, web-based platform that evaluates large language models (LLMs), ranks Gemma 4’s 31 billion parameter model in third place (score of 1452) behind GLM-5 (1456 score) and KM’s 2.5 Thinking (1453 score), while the 26 billion parameter model is in sixth place (1441 score). GLM-5 is built by Chinese AI company Z.ai, while the Kimi model is developed by Chinese company Moonshot AI. OpenAI’s gpt-oss-20b open-weight language model, released in August, has a score of 1318.
Farabet says the Gemma 4 outperforms older models 20 times larger. Gemma 4 is expected to solve a variety of generative AI tasks with text, audio and image input, support for over 140 languages, and long context windows of up to 128K and 256K. The 31b and 26b parameter models are designed for high-end servers with powerful GPUs, including Nvidia’s H100.
“The 26b and 31b models are designed for high-performance logic and developer-centric workflows, making them well-suited for agentic AI. Optimized to deliver cutting-edge, accessible logic, these models run efficiently on Nvidia RTX GPUs and DGX Spark – empowering development environments, coding assistants, and agent-driven workflows,” says Michael Fukuyama, product manager at Nvidia.
The 2 billion and 4 billion parameter footprint of the E2B and E4B models, along with optimizations to preserve RAM and battery life, will be critical to the usability of Gemma 4 on mobile devices. Nvidia confirms that the Jetson Orin Nano supports Gemma 4 E2b and E4b variants, enabling multimodel inference on small, embedded and power-constrained systems, with the same model family scaling up to Jetson Thor on the Jetson platform.