vLLM

High-throughput and memory-efficient inference and serving engine for large-language-models.

Core Features

Recent Developments

Source Notes