Google Gemma 4
Open-weight large language model family developed by google, featuring advanced architectures with native Multi-Token Prediction (MTP) capabilities.
Developments
- MTP Drafter Models Released (2026-05-06)
- Google published specialized drafter models for the Gemma 4 family to facilitate speculative-decoding.
- Drafters designed to significantly accelerate inference throughput, with optimizations for local deployment.
- Evaluations include performance comparisons against Google DeepMind DFlash.
- Source: Google Gemma 4 MTP Drafters: Accelerating Inference Speed with Speculative Decoding