Llama 3.1 Nemotron 70B
- Large language model (LLM) with 70.6 billion parameters requiring ~30 files of ~5GB each (150+ GB total storage) for full-precision deployment.
- Demands significant computational resources for inference at full precision, necessitating model-efficiency techniques.
- Used as a case study in Adam Lucek - quantisation of LLM to demonstrate quantization necessity and implementation for resource-constrained deployment.
2026 04 14 Adam Lucek quantisation of LLM