🗂️ AI & Agents · View mindmap

Autoregressive Models

Autoregressive models are a class of generative models that generate sequences data point by data point, where each prediction is conditioned on previously generated values. They are the foundational architecture behind most modern large-language-models (LLMs).

Core Mechanics

Sequential Generation: Predicts $x_{t}$ given $x_{1 : t - 1}$ .
Factorization: Joint probability $P (x) = \prod_{t = 1}^{T} P (x_{t} ∣ x_{< t})$ .
Training Objective: Typically maximize likelihood via next-token prediction.
Inference Bottleneck: Strict sequential dependency limits parallelization during generation, creating latency challenges addressed by techniques like speculative decoding.

Key Architectures

Transformer: Dominant architecture for NLP; uses self-attention to capture long-range dependencies in autoregressive settings.
Recurrent Neural Network (RNN)/Long Short-Term Memory (LSTM): Predecessors to Transformers, utilizing hidden states to maintain context.

Inference Optimization & Recent Developments

Speculative Decoding: A technique to accelerate autoregressive inference by using a smaller “draft” model to propose multiple tokens, which are then verified in parallel by the larger target model.
DeepSeek DSpark: A specific implementation of enhanced speculative decoding introduced by DeepSeek.
- Acts as a speed layer for LLMs, significantly accelerating inference without altering model weights.
- Demonstrated ability to double inference speed for models like Qwen3.
- See DeepSeek DSpark: LLM Inference Acceleration via Enhanced Speculative Decoding for detailed analysis.

References

DeepSeek DSpark: LLM Inference Acceleration via Enhanced Speculative Decoding

NemoClaw Knowledge Wiki

Explorer

autoregressive-models

Autoregressive Models

Core Mechanics

Key Architectures

Inference Optimization & Recent Developments

References

Graph View

Table of Contents

Backlinks