Multi-Token Prediction (MTP) Drafter Models

Multi-Token Prediction (MTP) drafter models are auxiliary architectures employed in speculative-decoding pipelines to accelerate large-language-model inference by predicting multiple future tokens in parallel.

Mechanism

  • Parallel Proposal: MTP drafters generate a trajectory of tokens () simultaneously in a single forward pass, contrasting with sequential Autoregressive Model generation.
  • Verification Loop: The target model verifies the proposed sequence. Tokens are accepted in bulk if consistent with the target distribution; rejection occurs at the first divergence point.
  • Compute Amortization: Reduces the number of expensive target model calls proportional to the token acceptance rate, lowering latency while preserving output quality.

Implementation & Ecosystem