Low-Rank Adaptation (LoRA)

Low-Rank Adaptation is a Parameter-Efficient Fine-Tuning technique that freezes pre-trained model weights and injects trainable low-rank decomposition matrices into specific layers. Instead of updating the full weight matrix , LoRA learns a delta , where and with rank . This approach drastically reduces trainable parameters and memory footprint, prevents catastrophic forgetting, and achieves performance parity with full fine-tuning across diverse tasks.

Mechanism

  • Replaces weight updates with low-rank factors added to frozen weights: .
  • Optimizes only and ; base weights remain static.
  • Post-training, can be merged into for zero-latency inference.
  • Widely applied to attention projections in Transformer and Diffusion Models architectures.
  • Synergizes with model-compression (e.g., QLoRA) to enable fine-tuning on constrained hardware.

Recent Applications & Developments