GGUF format
GGUF is a binary serialization format designed for efficient inference and distribution of large language models (LLMs). It is optimized for single-file distribution and high-performance loading.
Ecosystem & Compatibility
- Hardware Backends: Optimized for execution across cpu, GPU, and NPU architectures.
- Software Integration:
- Supported by nexa-sdk for private, local AI execution.
- Interoperable with MLX in specific deployment environments.
- Core format for ecosystems including llamacpp and ollama.
Related Concepts
- model-efficiency
- Inference
- GGML
Backlink: 2026 04 14 Nexa AI run models locally
Source Notes
- 2026-04-23: [[lab-notes/2026-04-23-Excels-IMPORTCSV-Dynamic-Multi-CSV-Data-Management-and-Reporting|Excel’s IMPORTCSV: Dynamic Multi-CSV Data Management and Reporting]]
- 2026-04-14: I Looked At Amazon After They Fired 16,000 Engineers. Their AI Broke Everything.