Contextual Embeddings
Dynamic vector representations of tokens where the embedding varies based on the surrounding sequence context, enabling resolution of polysemy and long-range dependencies. Unlike static embeddings, these are computed on-the-fly by the model’s architecture.
Mechanism & Properties
- Transformer Architecture: Contextual embeddings emerge from
[[concepts/self-attention]]layers withinTransformermodels. Each layer refines token representations by aggregating information from other positions in the sequence. - QKV Computation: The
Attention Mechanismprojects inputs intoQuery,Key, andValuespaces. Attention scores are derived from dot products of Q and K, normalized via softmax, and applied to V to compute weighted context aggregations. - Dynamic Representation: A single token yields distinct vectors depending on neighbors, capturing semantic nuance absent in fixed-lookup embeddings.
- Layer-wise Evolution: Contextual depth increases through stacked layers; early layers capture local syntax/adjacency, while deeper layers model global semantics and abstract relationships.
- Visual Intuition: 3Blue1Brown’s breakdown clarifies how attention weights function as dynamic focus mechanisms to construct rich, position-aware embeddings.