Memory-Footprint
The term “memory-footprint” refers to the amount of memory (RAM) used by a program or system when running. In the context of Large Language Models (LLMs), this concept is crucial as these models require significant amounts of RAM to store their weights and activations during inference.
Key Concepts:
- KV Cache Compression: Technique that aims at reducing the memory footprint of LLMs by compressing key-value cache, enabling more efficient use of available resources.
- Compression Algorithms: Utilized for data reduction in various formats, including text, images, audio, and video files. In the realm of LLMs, they are used to optimize model storage and execution.
Related Links:
- concept
- Technical Overview of LLM Inference: Loading, Memory, and Quantization: Technical analysis of model loading, inference mechanics, and quantization strategies affecting memory consumption; source material by Caleb Writes Code]].