🗂️ AI & Agents · View mindmap

Discrete Token Models

Discrete Token Models refer to architectures in natural language processing that generate or process data as a sequence of discrete symbols (tokens) from a finite vocabulary. This category primarily encompasses Auto-Regressive Modeling and emerging non-autoregressive approaches like diffusion-based text generation.

Core Architectures

Auto-Regressive Models: The dominant paradigm where tokens are generated sequentially, conditioning each step on previous outputs. Includes Transformers and LLMs.
Non-Autoregressive / Parallel Generation: Emerging methods aiming to reduce inference latency by generating tokens in parallel or via iterative refinement rather than strict left-to-right dependency.

Recent Developments & Innovations

Text Diffusion Adaptation: Google DeepMind has explored adapting diffusion processes, traditionally used for continuous image data, to discrete text generation.
- Text Diffusion: Google DeepMind’s Faster Parallel Text Generation via Denoising details how this approach utilizes denoising mechanisms to accelerate parallel text generation, challenging the sequential bottleneck of standard autoregressive models.
Discrete Latent Spaces: Techniques mapping continuous latent variables to discrete codebooks (e.g., VQ-VAE) to bridge generative image and text domains.

Key Challenges

Granularity & Lossiness: Discretization inevitably loses information compared to continuous representations.
Inference Speed: While parallel methods like Text Diffusion show promise, achieving stability and coherence comparable to strong autoregressive baselines remains a research focus.
Vocabulary Size Scaling: Managing efficiency as token sets grow larger in multimodal contexts.

NemoClaw Knowledge Wiki

Explorer

discrete-token-models

Discrete Token Models

Core Architectures

Recent Developments & Innovations

Key Challenges

Graph View

Table of Contents

Backlinks