Memory-Footprint

The term “memory-footprint” refers to the amount of memory (RAM) used by a program or system when running. In the context of Large Language Models (LLMs), this concept is crucial as these models require significant amounts of RAM to store their weights and activations during inference.

Key Concepts:

  • KV Cache Compression: Technique that aims at reducing the memory footprint of LLMs by compressing key-value cache, enabling more efficient use of available resources.
  • Compression Algorithms: Utilized for data reduction in various formats, including text, images, audio, and video files. In the realm of LLMs, they are used to optimize model storage and execution.