ASR Accuracy

Quantitative measure of transcription fidelity in automatic-speech-recognition systems relative to ground-truth references. Accuracy is inversely correlated with error rates and influenced by acoustic quality, language complexity, and model architecture.

Key Metrics

  • Word Error Rate (WER): Standard metric for space-separated languages; calculated as where =substitutions, =deletions, =insertions, =total words. Lower values indicate higher accuracy.
  • Character Error Rate (CER): Preferred for non-space delimited languages (e.g., Chinese, Japanese) or when vocabulary coverage is limited.
  • Real-Time Factor (RTF): Measures efficiency; impacts perceived accuracy in streaming applications via latency constraints.

Influencing Factors

  • Acoustic Conditions: Background noise, reverberation, and microphone quality degrade signal-to-noise ratio, increasing error rates. See Noise Robustness.
  • Domain Mismatch: Performance drops when test data distribution differs significantly from training data. Mitigated via Domain Adaptation.
  • Oov Tokens: Out-of-vocabulary terms contribute to substitution errors; addressed by subword tokenization or end-to-end models.
  • Speaker Variability: Accents, dialects, and speaking rate affect recognition consistency.

Recent Models & Developments

See Also