Abstract

Automatic speech recognition (ASR) is used for accurate and efficient conversion of speech signal into a text message. The area of ASR is being discussed since past few decades and significant advancement is being observed periodically on the ASR and language spoken systems. However, there are many technological hurdles yet to reach flexible solutions that satisfies user. This is because of many factors such as environmental noise, paucity of robustness to speech variations (foreign accents, sociolinguistics, gender and speaking rate), spontaneous or freestyle speech and others. To realise the ubiquitous adoption of speech technology, there is a need to bridge the space between what speech recognition technologies can convey and what human need from it. To make it up, technology must deliver robust and high-recognition accuracy near to man-like performance so it demands to focus on the challenges in speech technology. Generally, speech signal is taken as input, and it is processed at front end to extract features and then computed at back end using the Gaussian mixture model (GMM). GMMmixtureselectionis quiteimportantdependingupon thesizeofdataset. Asforconcisevocabulary, use of triphone-based acoustic modelling exhibits good result, the same has been implemented for Sanskrit language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.