Vocabulary Size

The total number of unique tokens present within a model’s Tokenizer.

Architectural Impact

  • Defines the input/output dimensionality for the Embedding Matrix and the final output projection layer.
  • Directly scales the number of parameters within the model’s weight tensors.

LLM Inference & Performance

2026 04 22 LLM Inference Engines Memory Mapping and Performance Optimization

Source Notes