Instruction Following

The ability of a language model to accurately interpret and execute user instructions, often involving complex reasoning, multi-step tasks, or specific formatting requirements. Critical for practical applications of large-language-models in user-facing systems.

Best Small LLMs for Local Inference (for instruction following)

For running well-instructed large-language-models on a 48GB VRAM NVIDIA GPU, the following quantized models are strong contenders:

  • Llama 3.1 70B (quantized): Llama 3.1 model (Meta) that effectively runs on 48GB VRAM for instruction-following tasks
  • Gemma 2 27B (quantized): Gemma 2 model providing strong performance for instruction-following
  • Qwen 2 72B (quantized): Qwen 2 model excelling in instruction-following when quantized
  • Mistral Large (quantized): Mistral Large model offering strong instruction-following capabilities when quantized

For running well-instructed small Large Language Models (LLMs) on a 48GB VRAM NVIDIA GPU, Llama 3.1 70B (quantized) is a strong contender. Other viable options include quantized versions of Gemma 2 27B, Qwen 2 72B, and Mistral Large. These models, when properly quantized to reduce their size, can effectively run on a 48GB VRAM local inference setup.

Source Notes

  • 2026-04-14: [[lab-notes/2026-04-14-Optimizing-AI-Costs-and-Privacy-with-Local-Open-Source-Models-and-Hybr|“But OpenClaw is expensive…“]]