Abstract
Speech technology is a broader area comprising many applications like speech recognition, Text to Speech (TTS) Synthesis, speaker identification and verification and language identification. Different applications of speech technology impose different constraints on the problem and these are tackled by different algorithms. In this chapter, the focus is on automatically transcribing speech utterances to text. This process is called Automatic Speech Recognition (ASR). ASR deals with transcribing speech utterances into text of a given language. Even after years of extensive research and development, ASR still remains a challenging field of research. But in the recent years, ASR technology has matured to a level where success rate is higher in certain domains. A well-known example is human-computer interaction where speech is used as an interface along with or without other pointing devices. ASR is fundamentally a statistical problem. Its objective is to find the most likely sequence of words, called hypothesis, for a given sequence of observations. The sequence of observations involves acoustic feature vectors representing the speech utterance. The performance of an ASR system can be measured by aligning the hypothesis with the reference text and by counting errors like deletion, insertion and substitution of words in the hypothesis.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.