Vocabulary Size

The total number of unique tokens present within a model’s Tokenizer.

Architectural Impact

  • Defines the input/output dimensionality for the Embedding Matrix and the final output projection layer.
  • Directly scales the number of parameters within the model’s weight tensors.

LLM Inference & Performance

  • Memory and Loading: Larger vocabulary sizes increase the scale of the tensor collection that must be loaded and managed via Memory Mapping during LLM Inference (see 2026 04 22 LLM Inference Engines Memory Mapping and Performance Optimization).
  • Computational Complexity: Affects Performance Optimization strategies, as a high vocabulary cardinality increases the computational overhead required for calculating Logits and the Softmax distribution.

2026 04 22 LLM Inference Engines Memory Mapping and Performance Optimization

Source Notes

  • 2026-04-22: [[lab-notes/2026-04-22-LLM-Inference-Engines-Memory-Mapping-and-Performance-Optimization|LLM Inference: Engines, Memory Mapping, and Performance Optimization]]