Google Gemma 4 Open-Weight Models: Apache 2.0 and Enhanced [[concepts/capabilities|AI
Capabilities]] Clip title: Gemma 4 Has Landed! Author / channel: Sam Witteveen URL: https://www.youtube.com/watch?v=5aqF1HVpjdc
Summary
Google has launched Gemma 4, a new suite of open-weight models that significantly advance their Gemma series, primarily by adopting a developer-friendly Apache 2.0 license. This license is a major highlight, allowing users unprecedented freedom to use, modify, distribute, and commercially deploy Google’s best open models without restrictive clauses. Gemma 4 comprises four distinct models with enhanced capabilities across multimodality, thinking (reasoning), native audio processing, and robust function calling. This move is seen as Google’s direct response to previous criticisms regarding restrictive licensing on earlier Gemma versions, aiming to foster broader adoption and innovation within the open-source AI community.
The Gemma 4 models are categorized into two tiers: “Workstation Models” and “Edge Models.” The Workstation tier includes a 31 billion parameter (31B Dense) full-dense architecture and a 26 billion parameter Mixture-of-Experts (26BA4B MoE) model, where 4 billion parameters are active at any given moment, distributed among 128 tiny experts. These are designed for high-performance inference. The Edge tier features smaller, highly efficient models (E2B and E4B) with approximately 2 billion and 4 billion effective parameters, respectively. These tiny models are optimized to run on resource-constrained devices like phones, Raspberry Pis, and Jetson Nanos, making them suitable for on-device AI assistants and applications.
A key architectural advancement in Gemma 4, drawing from Google’s Gemini 3 research, is the native integration of multimodality and enhanced reasoning. Unlike previous models that often required external tools for capabilities beyond text or text-plus-vision, Gemma 4 natively supports vision, audio, and function calling within a single model family. The new “thinking” capability allows models to perform internal chain-of-thought reasoning before generating an output, significantly improving performance on complex benchmarks and enabling reasoning across modalities, including audio for the first time. The integrated function calling leverages FunctionGemma research, optimizing models for multi-turn agentic workflows and allowing them to maintain context and utilize external tools effectively.
Specifically, the Edge models (E2B & E4B) boast significantly better native audio support compared to their predecessors. They feature a conformer-layer ASR encoder for improved audio recognition accuracy, built-in speech recognition, and speech-to-translated-text capabilities. The audio encoder is also 50% smaller and offers faster processing, crucial for low-latency edge deployments. For vision, Gemma 4 handles images at their native aspect ratio and various resolutions, supporting interleaved multi-image inputs. This enhances capabilities for Optical Character Recognition (OCR), object recognition, document understanding, and improved video understanding with temporal reasoning. Gemma 4 is available on Hugging Face and Google Cloud, with Cloud Run now supporting NVIDIA RTX Pro 6000 (Blackwell) GPUs for serverless deployment of even the larger models.
Related Concepts
- Open-weight models — Wikipedia
- Apache 2.0 license — Wikipedia
- Native audio processing — Wikipedia
- Function calling — Wikipedia
- Multimodality — Wikipedia
- Chain-of-thought reasoning — Wikipedia
- Mixture-of-Experts (MoE) — Wikipedia
- Dense architecture — Wikipedia
- Agentic workflows — Wikipedia
- ASR (Automatic Speech Recognition) — Wikipedia
- Conformer-layer architecture — Wikipedia
- Temporal reasoning — Wikipedia
- OCR (Optical Character Recognition) — Wikipedia
- Serverless deployment — Wikipedia
- On-device AI — Wikipedia
- Edge computing — Wikipedia
- Multimodal inference — Wikipedia
- Speech-to-text — Wikipedia
- Computer vision — Wikipedia
- FunctionGemma — Wikipedia