QKV System

QKV System refers to the linear projection mechanism generating Query, Key, and Value vectors, which constitute the core computation within the Scaled Dot-Product Attention mechanism of Transformer architectures. This system enables the model to compute dynamic contextual-embeddings by allowing each token to attend to relevant information from other tokens.

Mechanics

  • Projections: Input embeddings are transformed via learned weight matrices (, , ) into Query, Key, and Value subspaces.
  • Attention Computation: Similarity scores between and determine attention weights; these weights are applied to to aggregate information.
  • Contextualization: The weighted sum of vectors produces the output representation, embedding global context into each token position.

Resources & Insights