Reduced precision
Use of lower-precision data types (e.g., 8-bit, 4-bit) instead of standard 32/64-bit floating-point to reduce computational/memory costs in machine learning systems.
- 4-bit training evolution: Enables direct training of large language models (LLMs) at 4-bit floating-point (FP4) precision, reducing memory bandwidth and computational requirements compared to traditional 16/32-bit training 4-bit
- Cost reduction: Training costs for state-of-the-art LLMs remain extremely high (e.g., Gemini Ultra training cost ~78M; Sam Altman claims higher) LLM training costs
- Key application: 4-bit quantization addresses scalability challenges in large-language-models by making training feasible with reduced hardware resources
- Trade-off: Requires specialized techniques to maintain model accuracy during training at low precision model-efficiency
2026 04 14 How does 4bit quantisation work
Source Notes
- 2026-04-14: [[lab-notes/2026-04-14-Optimizing-AI-Costs-and-Privacy-with-Local-Open-Source-Models-and-Hybr|“But OpenClaw is expensive…“]]