ASR Models
Architectures for automatic-speech-recognition converting acoustic signals to text sequences. Ranges from hybrid HMM-DNN systems to end-to-end Transformer and Conformer networks. Includes streaming, non-streaming, and multimodal variants integrating computer-vision or large-language-model context.
Notable Models & Updates
- IBM Granite Speech 4.1: Open ASR model within the Granite 4.1 family spanning language, vision, speech, and embeddings; emphasized for inference speed and enterprise applicability.
- Analysis Reference: IBM Granite Speech 4.1 ASR Models: Features, Accuracy, and Enterprise Applications covers features, accuracy benchmarks, and speed evaluation (Sam Witteveen, 2026-05-08).
- Key Capabilities: Open-weight availability; optimized for low-latency transcription; part of broader multimodal foundation suite.
Related Concepts
- Speech Processing
- Language Modeling
- model-efficiency
- open-source