Scalable Lookup

A technique enabling efficient memory access in large language models (LLMs) by distinguishing between simple recall and deep computational tasks, reducing redundant processing.

Core Innovation

  • Conditional Memory via Scalable Lookup (DeepSeek Engram paper): Introduces a new axis of sparsity that avoids wasteful computation in Transformers by:
    • Distinguishing tasks requiring deep thought (computationally intensive) from simple recall (memory lookup)
    • Enabling scalable memory retrieval without increasing model parameter count
    • Providing task-specific computation allocation

Key Implications

  • Eliminates redundant processing for recall-based tasks (e.g., fact retrieval vs. reasoning)
  • Reduces computational cost while maintaining model capacity
  • Enables more efficient LLM inference through selective

Inference Infrastructure

Source Notes