Numerical Representations
Numerical representations refer to the mapping of non-numerical data (such as text, images, or categorical labels) into vector spaces where mathematical operations can be performed. This process is fundamental to Machine Learning and Deep Learning, enabling algorithms to process and analyze unstructured data.
Core Types
- Categorical Encoding: Methods like One-Hot Encoding, Label Encoding, and Target Encoding transform discrete categories into numerical features suitable for statistical models.
- Continuous Mapping: Techniques such as Normalization and Standardization scale continuous variables to ensure uniformity across features.
- Dimensionality Reduction: Algorithms like PCA (Principal Component Analysis) and t-SNE compress high-dimensional data while preserving structure.
Semantic & Textual Representations
Text embeddings map linguistic units into dense vector spaces, preserving semantic relationships.
- Foundational Concept: Vector Embeddings: Semantic Representation for NLP and AI
- Mechanism: Converts words, phrases, or documents into numerical vectors where geometric proximity correlates with semantic similarity.
- Applications:
- Natural Language Processing (NLP) pipelines.
- Semantic search and recommendation systems.
- Input layers for transformers and other neural network architectures.
- Key Distinction: Unlike sparse one-hot representations, embeddings are dense and capture latent semantic structures, enabling generalization across unseen vocabulary.
References
- data-preprocessing
- Feature Engineering
- Vector Space Models