ASR Models

Architectures for automatic-speech-recognition converting acoustic signals to text sequences. Ranges from hybrid HMM-DNN systems to end-to-end Transformer and Conformer networks. Includes streaming, non-streaming, and multimodal variants integrating computer-vision or large-language-model context.

Notable Models & Updates