Instruction Following

The ability of a language model to accurately interpret and execute user instructions, often involving complex reasoning, multi-step tasks, or specific formatting requirements. Critical for practical applications of large-language-models in user-facing systems.

Best Small LLMs for Local Inference (for instruction following)

For running well-instructed large-language-models on a 48GB VRAM NVIDIA GPU, the following quantized models are strong contenders:

For running well-instructed small Large Language Models (LLMs) on a 48GB VRAM NVIDIA GPU, Llama 3.1 70B (quantized) is a strong contender. Other viable options include quantized versions of Gemma 2 27B, Qwen 2 72B, and Mistral Large. These models, when properly quantized to reduce their size, can effectively run on a 48GB VRAM local inference setup.

Source Notes