Google Introduces Gemma 4: Versatile Models Spanning Smartphones to Workstations

Google Introduces Gemma 4: Versatile Models Spanning Smartphones to Workstations

3 Min Read

Derived from the same research as Gemini 3, the new models range from the 2B edge model that can operate on a Raspberry Pi to a 31B dense model holding third place on the Arena AI open-model leaderboard. The Apache 2.0 license marks a significant change from previous Gemma versions.

Google has unveiled Gemma 4, the latest in its open-weight model series, available in four sizes to handle tasks from on-device smartphone inference to workstation-level applications.

The models are created from the same research and technology as Gemini 3, Google’s advanced proprietary model, and are released under the more permissive Apache 2.0 license, differing from previous Gemma releases, a change described by Hugging Face co-founder Clément Delangue as “a huge milestone.”

Demis Hassabis, CEO of Google DeepMind, described the new models as “the best open models in the world for their respective sizes.”

TNW City Coworking space – Where your best work happensA workspace designed for growth, collaboration, and endless networking opportunities in the heart of tech.

The four models include the Effective 2B (E2B) and Effective 4B (E4B) edge models, meant for use on phones, Raspberry Pi, and Jetson Nano hardware, developed with the Pixel team, Qualcomm, and MediaTek; and the 26B Mixture-of-Experts (MoE) and 31B Dense models, targeted for offline developer hardware and consumer GPUs.

The 31B Dense model is third among open models on the Arena AI text leaderboard, with the 26B MoE at sixth. Google asserts that these larger models surpass models up to 20 times their size on this benchmark.

The 31B’s unquantized weights occupy a single 80GB Nvidia H100 GPU; quantized versions run on consumer hardware.

All models are multimodal, natively handling video and images, and trained in over 140 languages. The E2B and E4B models also support native audio input for speech recognition. Context windows are 128K tokens for the edge models and 256K for the larger versions.

In terms of capability, Google emphasizes improvements in multi-step reasoning, native function-calling, structured JSON output for agent workflows, and offline code generation. Performance-wise, the Android Developers Blog notes that the E2B model is three times faster than the E4B, and the edge family overall is up to four times faster than previous Gemma versions while using up to 60% less battery.

The E2B and E4B models also serve as the base for Gemini Nano 4, Google’s next-gen on-device model for Android, expected on consumer devices later this year.

Since its initial release, Gemma has achieved over 400 million downloads and more than 100,000 community-created variants, demonstrating large-scale developer adoption.

Gemma 4 is now available on Hugging Face, Kaggle, and Ollama, with the 31B and 26B models offered via Google AI Studio and the edge models through AI Edge Gallery.

The decision for Apache 2.0 licensing is a key commercial development in the launch: it eliminates constraints that restricted certain enterprise and commercial uses under the older Gemma terms,

You might also like