Abstract

Speech recognition of a language is a key area in the field of pattern recognition. This paper presents a comprehensive survey on the speech recognition techniques for non-Indian and Indian languages, and compiled some of the computational models used for processing speech acoustics. An immense number of frameworks are available for speech processing and recognition for languages persisting around the globe. However, a limited number of automatic speech recognition systems are available for commercial use. The gap between the languages being spoken around the globe and the technical support available to these languages are very few. This paper examined major challenges for speech recognition for different languages. Analysis of the literature shows that lack of standard databases availability of minority languages hinder the research recognition research across the globe. When compared with non-Indian languages, the research on speech recognition of Indian languages (except Hindi) has not achieved the expected milestone yet. Combination of MFCC and DNN–HMM classifier is most commonly used system for developing ASR minority languages, whereas in some of the majority languages, researchers are using much advance algorithms of DNN. It has also been observed that the research in this field is quite thin and still more research needs to be carried out, particularly in the case of minority languages.

Highlights

  • The most continually evolving and explored the area in speech processing is Automatic Speech Recognition (ASR)

  • Environment variation, channel variation, style of speaking, age, and gender contributes to the challenging task of speech recognition, e.g., Kirchhoff and Vergyri [158] mentioned that the Arabic language script falls short of the vowels as well as other information related to the phones

  • The authors have presented an extensive review and analysis of different feature extraction techniques employed for speech recognition for non-Indian and Indian languages

Read more

Summary

Introduction

The most continually evolving and explored the area in speech processing is Automatic Speech Recognition (ASR). Complex & Intelligent Systems tation which automatically generates written text from the input signal, database accesses, interfaces for human communication, machine control, and accessing automatic remote services over a dial-up connection for some majority languages. The front-end processing includes pre-emphasis and extracting features. These feature extraction methods are explained in later sections of the paper. The speech sample is decoded at the backend with the help of knowledge gained from the acoustic model, language model, and pronunciation model. This transforms the input speech signal into a text string in a readable format. The front end corresponds to the training phase and the back end corresponds to the testing phase

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.