NemoClaw Knowledge Wiki

❯

❯

model-size

Apr 15, 20262 min read

machine-learning
large-language-models
model-optimization
parameter-count
bit-precision
storage-footprint
hardware-demands
quantization
model-compression
inference-latency

Model Size

The physical and computational footprint of a machine learning model, primarily determined by the number of parameters (e.g., 7B, 70B) and their precision (e.g., 32-bit, 8-bit). Larger models require more storage, memory, and computational resources for training and inference.

Key implications:

Storage: Full-precision 70B models (e.g., NVIDIA’s Llama 3.1 Nemotron 70B) require ~150GB (30 files × 5GB each).
Hardware demands: High memory bandwidth and VRAM needed for inference, limiting deployment on consumer hardware.
Trade-offs: Larger size often correlates with better performance but increases latency and cost.

Quantisation as optimization technique:

model-efficiency reduces parameter precision (e.g., 32-bit → 8-bit), shrinking storage needs by ~75% (e.g., 70B model → ~30GB).
Enables deployment of large models on edge devices and reduces inference latency.
Source: Adam Lucek - quantisation of LLM (2026-04-14 video).

2026 04 14 Adam Lucek quantisation of LLM

Source Notes

2026-04-14: # Qwen TTS model - Sam Witteveen channel --- --- https://www.youtube.com/watch?v=jZ8wPB-KI8g Sure! Here’s a summary of the Qwen3-TTS family of models: * Open Source: The Qwen team recently open-sourced the Qwen3-TTS family, which includes features like voice design, voice c (Qwen TTS model - Sam Witteveen channel)

Source Notes

2026-04-23: Engine Survival: The Critical Role of Oil Pressure and Warning Lights
2026-04-14: [[lab-notes/2026-04-14-Optimizing-AI-Costs-and-Privacy-with-Local-Open-Source-Models-and-Hybr|“But OpenClaw is expensive…“]]

Graph View

Model Size
Source Notes
Source Notes

Backlinks

INDEX
Qwen 3 Coder explained
New SmoILM3 from hugging face
Qwen 3 Coder explained
Small LLM models
edge-computing
model-compression
scale-effect
scaling-laws
storage-requirements
AI & Agents
nanonets-ocr-small
New SmoILM3 from hugging face
Qwen 3 Coder explained

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community