title: “Model Efficiency”
Model Efficiency
Model Efficiency refers to how effectively a machine learning model utilizes computational resources (e.g., memory, processing power) while maintaining or improving performance. This includes both the design and training aspects of models that aim to minimize resource consumption without sacrificing functionality.
Key Concepts
- Memory Footprint: The amount of memory used by a model during inference or training.
- Inference Latency: The time taken for a model to produce an output after receiving input.
- Training Efficiency: How quickly and effectively a model can be trained with limited resources.
Related Technologies
- Quantization
- Pruning
- Knowledge Distillation
Recent Developments
- Gemini 3 Flash: Focused on speed, efficiency, and low cost ($0.50/1M tokens); achieves 78% on [[benchmarks/swe-bench-verified|SWE-bench Veri
- Bonsai Image: Prism ML’s introduction of local 1-bit image generation through extreme quantization, significantly reducing model size while maintaining high quality for on-device inference. See Bonsai Image: Local 1-bit Image Generation Through Extreme Quantization.