🗂️ AI & Agents · View mindmap

Contextual Embeddings

Dynamic vector representations of tokens where the embedding varies based on the surrounding sequence context, enabling resolution of polysemy and long-range dependencies. Unlike static embeddings, these are computed on-the-fly by the model’s architecture.

Mechanism & Properties

Transformer Architecture: Contextual embeddings emerge from [[concepts/self-attention]] layers within Transformer models. Each layer refines token representations by aggregating information from other positions in the sequence.
QKV Computation: The Attention Mechanism projects inputs into Query, Key, and Value spaces. Attention scores are derived from dot products of Q and K, normalized via softmax, and applied to V to compute weighted context aggregations.
Dynamic Representation: A single token yields distinct vectors depending on neighbors, capturing semantic nuance absent in fixed-lookup embeddings.
Layer-wise Evolution: Contextual depth increases through stacked layers; early layers capture local syntax/adjacency, while deeper layers model global semantics and abstract relationships.
Visual Intuition: 3Blue1Brown’s breakdown clarifies how attention weights function as dynamic focus mechanisms to construct rich, position-aware embeddings.

Sources & Notes

Transformer Attention Mechanism Explained: Contextual Embeddings and QKV System

NemoClaw Knowledge Wiki

Explorer

contextual-embeddings

Contextual Embeddings

Mechanism & Properties

Sources & Notes

Graph View

Table of Contents

Backlinks