Word Embeddings

Word embeddings are a type of word representation that allows words with similar meaning to have similar representations. They are a distributed representation for text that is perhaps one of the key breakthroughs in the history of NLP.

Core Concepts

  • Numerical Representation: Converts discrete tokens (words, phrases, documents) into dense vectors of real numbers.
  • Semantic Proximity: Words with similar meanings are located close to each other in the vector space Vector Space Model.
  • Dimensionality Reduction: Maps high-dimensional sparse One-Hot Encoding vectors to lower-dimensional dense vectors, capturing syntactic and semantic information.

Key Algorithms & Models

  • Word2Vec: Includes CBOW and Skip-gram architectures.
  • GloVe: Global Vectors for Word Representation, leveraging global matrix factorization.
  • Transformer-based models (e.g., bert, GPT) generate contextualized embeddings.

Applications

Integration & Notes