Self-Supervised Learning & Joint Embedding Predictive Architecture
Self-Supervised Learning (SSL) is a paradigm where models learn representations from unlabeled data by generating supervisory signals from the data structure itself. A prominent SSL framework is the Joint Embedding Predictive Architecture (JEPA), proposed by yann-lecun for training world-models.
Joint Embedding Predictive Architecture (JEPA)
JEPA predicts future states or missing context within an abstract embedding space, rather than reconstructing raw data or predicting sequential tokens.
Core Mechanics
- Latent Prediction: Context encoder processes observed inputs to generate representations; predictor network forecasts representations of target inputs (future or masked) in the latent space.
- No Reconstruction: Loss functions operate solely on embeddings, avoiding the high-dimensional noise and computational waste associated with pixel or token-level reconstruction.
- Modularity: Supports diverse modalities (vision, text, sensor data) by mapping inputs to a unified representation space before prediction.
Comparison to LLMs
- JEPA targets inefficiencies in large-language-models by modeling state transitions and causal structures directly, rather than relying on statistical co-occurrence of tokens.
Related Projects & Challenges
- Project Aristotle: Implications and Challenges: Discusses broader implications and challenges in AI development, relevant to the shift from next-token prediction to world-modeling approaches like JEPA.