Abstract

Speech is a key means of communication. Nowadays, speech is becoming a more common, if not standard, interface to technology. This will be seen within the trend of technology changes over the years. Increasingly, voice is employed to regulate programs, appliances and private devices within homes, cars, workplaces, and public spaces through smartphones and residential assistant devices using Amazon's Alexa, Google's Assistant and Apple's Siri, and other proliferating technologies. This is often achievable with the help of Automatic Speech Recognition (ASR). Automatic Speech Recognition is a process that accurately translates spoken utterances into text. These technologies enable machines to reply correctly and reliably to human voices and supply useful and valuable services. As communicating with computer is quicker using voice instead of using keyboard, so people will prefer such system. Communication among the person is dominated by speech, therefore it’s natural for people to expect voice interfaces with computer. This can be accomplished by developing speech to text which allows computer to translate voice request and dictation into text. The three models in traditional ASR system are acoustic model, language model and lexicon model. The challenges involved in Automatic Speech Recognition are different styles of speech, environment which include background noise and also accent of speaker. To mitigate these challenges, deep learning models are utilized. The main idea is to analyses features of input audio signals such as spectrogram and MFCC and to develop cutting edge deep learning models. The proposed end-to-end model achieved an error rate of 0.60 on Librispeech dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.