Autoregressive Models

Autoregressive models are a class of generative models that generate sequences data point by data point, where each prediction is conditioned on previously generated values. They are the foundational architecture behind most modern large-language-models (LLMs).

Core Mechanics

  • Sequential Generation: Predicts given .
  • Factorization: Joint probability .
  • Training Objective: Typically maximize likelihood via next-token prediction.

Key Architectures

  • Transformer: Dominant architecture for NLP; uses self-attention to capture long-range dependencies in autoregressive settings.
  • Recurrent Neural Network (RNN)/Long Short-Term Memory (LSTM): Earlier sequential models, largely superseded by Transformers for text but still relevant in time-series.

Comparison with Non-Autoregressive Generative Models

While autoregressive models dominate text generation, alternative paradigms exist:

  • Diffusion Models: Iterative denoising processes. Traditionally used for images, recently adapted for discrete data.
  • Flow Matching: Continuous normalizing flows for high-dimensional data generation.

Recent Developments in Hybrid/Iterative Approaches

References