Autoregressive Models
Autoregressive models are a class of generative models that generate sequences data point by data point, where each prediction is conditioned on previously generated values. They are the foundational architecture behind most modern large-language-models (LLMs).
Core Mechanics
- Sequential Generation: Predicts given .
- Factorization: Joint probability .
- Training Objective: Typically maximize likelihood via next-token prediction.
Key Architectures
- Transformer: Dominant architecture for NLP; uses self-attention to capture long-range dependencies in autoregressive settings.
- Recurrent Neural Network (RNN)/Long Short-Term Memory (LSTM): Earlier sequential models, largely superseded by Transformers for text but still relevant in time-series.
Comparison with Non-Autoregressive Generative Models
While autoregressive models dominate text generation, alternative paradigms exist:
- Diffusion Models: Iterative denoising processes. Traditionally used for images, recently adapted for discrete data.
- Flow Matching: Continuous normalizing flows for high-dimensional data generation.
Recent Developments in Hybrid/Iterative Approaches
- DiffusionGemma: Google DeepMind’s Iterative Diffusion-Based LLM for Text Generation: Represents a shift towards diffusion-based token generation, challenging the strict autoregressive paradigm by using iterative refinement rather than sequential token prediction.