Dense Vectors
Dense vectors (also known as embeddings) are high-dimensional numerical representations where each dimension captures a latent semantic feature. Unlike sparse vectors (e.g., one-hot encoding), dense vectors store continuous values, enabling models to capture semantic similarity and relationships between data points.
Key Characteristics
- Continuity: Values are real numbers, allowing for gradient-based optimization.
- Dimensionality: Typically hundreds to thousands of dimensions, balancing expressiveness and computational cost.
- Semantic Proximity: Vectors representing similar concepts are closer in vector space (measured via cosine similarity or Euclidean distance).
- Generalization: Models can infer relationships not explicitly seen during training (e.g., king - man + woman ≈ queen).
Applications
- natural-language-processing (NLP): Word, sentence, and document embeddings.
- Recommendation Systems: User and item latent factor models.
- computer-vision: Feature extraction for image classification.
- Search and Retrieval: Semantic search beyond keyword matching.
Related Concepts
- Word2Vec
- transformers
- vector-database
- Cosine Similarity
References & Notes
- Vector Embeddings: Semantic Representation for NLP and AI: Comprehensive overview by Thu Vu (2026), covering text embeddings as numerical representations of words/phrases/documents for semantic representation in NLP and AI.