AI Portal Gun
Generative-Models
Speech Recognition

Automatic Speech Recognition

Automatic Speech Recognition (ASR) employs machine learning algorithms to convert spoken language into text, enabling machines to understand and transcribe human speech. It finds applications in voice commands, transcription services, and various other tasks.

AI music meme

Explainers

  • Automatic Speech Recognition: An Overview (opens in a new tab): The video underscores the significance of ASR systems in applications like voicemail transcription and closed captioning for the deaf and hard-of-hearing. It also addresses challenges and ongoing efforts in the field, such as accent adaptation and enhancing ASR system quality. If you want to learn more, then consider ASR resources on Hugging Face.

Speech to Text Models

  • Whisper (opens in a new tab), an ASR system by OpenAI, transcribes English and multiple languages, also translating non-English languages to English. Trained on 680,000 hours of web data, it excels in handling accents, background noise, and technical language. While having text prediction limitations, Whisper shows strong ASR results in ~10 languages, enhancing accessibility tools. (model)

Advancements