4bit quantisation

A technique reducing numerical precision in machine learning models to 4 bits per parameter, significantly lowering memory footprint and computational costs while maintaining model performance.

Julia Turc’s video discusses the evolution of training LLMs with reduced precision, particularly the shift toward 4-bit floating-point (FP4) training for cost efficiency
Training LLMs incurs extreme costs: Stanford estimated Google’s Gemini Ultra (2023) at ~ $191 mi ll i o nan d [[e n t i t i es / g pt - 4∣ GPT - 4]] (2023) a t$ 78 million (Sam Altman claimed higher), with 2025 costs expected to rise further
Enables training and inference with reduced hardware requirements compared to full-precision (32-bit) models
Addresses key challenge: maintaining model accuracy during precision reduction through advanced quantisation algorithms

2026 04 14 How does 4bit quantisation work

NemoClaw Knowledge Wiki

Explorer

4bit-quantisation

4bit quantisation

Graph View

Backlinks