Model Size
The physical and computational footprint of a machine learning model, primarily determined by the number of parameters (e.g., 7B, 70B) and their precision (e.g., 32-bit, 8-bit). Larger models require more storage, memory, and computational resources for training and inference.
Key implications:
- Storage: Full-precision 70B models (e.g., NVIDIA’s Llama 3.1 Nemotron 70B) require ~150GB (30 files × 5GB each).
- Hardware demands: High memory bandwidth and VRAM needed for inference, limiting deployment on consumer hardware.
- Trade-offs: Larger size often correlates with better performance but increases latency and cost.
Quantisation as optimization technique:
- model-efficiency reduces parameter precision (e.g., 32-bit → 8-bit), shrinking storage needs by ~75% (e.g., 70B model → ~30GB).
- Enables deployment of large models on edge devices and reduces inference latency.
- Source: Adam Lucek - quantisation of LLM (2026-04-14 video).
2026 04 14 Adam Lucek quantisation of LLM
Source Notes
- 2026-04-14: # Qwen TTS model - Sam Witteveen channel --- --- https://www.youtube.com/watch?v=jZ8wPB-KI8g Sure! Here’s a summary of the Qwen3-TTS family of models: * Open Source: The Qwen team recently open-sourced the Qwen3-TTS family, which includes features like voice design, voice c (Qwen TTS model - Sam Witteveen channel)
Source Notes
- 2026-04-23: Engine Survival: The Critical Role of Oil Pressure and Warning Lights
- 2026-04-14: [[lab-notes/2026-04-14-Optimizing-AI-Costs-and-Privacy-with-Local-Open-Source-Models-and-Hybr|“But OpenClaw is expensive…“]]