Local AI Tools
Software and frameworks enabling the execution, management, and deployment of Large Language Models (LLMs) and other AI workloads on local hardware.
Key Frameworks & Tools
- llamacpp: High-performance C/C++ library for running LLMs locally.
- Router Mode: A native feature for hot-swapping models without restarting the server. See llama.cpp Router Mode: Native Hot-Swappable Local LLM Switching.
- Ollama: Simplifies running LLMs locally via CLI and REST API.
- LM Studio: GUI-based interface for downloading and running local LLMs.
- Text Generation WebUI (oobabooga): Comprehensive web interface for local LLM inference.
Concepts
- Model Quantization: Reducing model precision (e.g., Q4_K_M) to fit larger models in limited VRAM.
- Context Window: The number of tokens the model can process at once.
- KV Cache: Key-Value cache storing past inputs to speed up generation.
Resources
- Local AI Hardware Requirements
- LLM Model Formats (GGUF, SAFETENSORS)