Token Generation Speed

Token Generation Speed (often measured as tokens per second, tps) is the metric defining how rapidly a Large Language Model (llm) produces output during autoregressive inference. It is a primary bottleneck in user experience and system throughput.

Key Determinants

Optimization Techniques

References

  • llamacpp documentation on MTP implementation.
  • Tim Carambat’s analysis of MTP integration in 2026.