Abstract

Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system’s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to overcome these issues. This paper uses speech information from jim-schwoebel voice datasets processed by Mel-frequency cepstral coefficients (MFCCs). The MFCC algorithm extracts the valuable features that are used to recognize speech. Here, a sparse auto-encoder (SAE) neural network is used to classify the model, and the hidden Markov model (HMM) is used to decide on the speech recognition. The network performance is optimized by applying the Harris Hawks optimization (HHO) algorithm to fine-tune the network parameter. The fine-tuned network can effectively recognize speech in a noisy environment.

Highlights

  • Artificial intelligence (AI) methods [1] evolve rapidly and are increasingly creating effective communication systems

  • The automatic speech recognition (ASR) system combines the fields of linguistics, computer science, natural language processing (NLP), and computer engineering

  • This paper proposed the Harris Hawks sparse auto-encoder network (HHSAE-ASR) framework for automatic speech recognition

Read more

Summary

Introduction

Artificial intelligence (AI) methods [1] evolve rapidly and are increasingly creating effective communication systems. AI can both effectively analyze and recreate the human voice, and automatic speech recognition (ASR) systems [2] have been created to achieve communication and dialogue like real people’s conversation. The ASR system combines the fields of linguistics, computer science, natural language processing (NLP), and computer engineering. The system needs a training process to understand the individual speakers and recognize the speeches; here, speakers read the text and vocabularies to get the speaker’s inner details (speaker-dependent). Most of the ASR system does not require the speakerindependent system’s training process. Advancement of machine and deep learning techniques is highly involved in ASR to improve the Persian speech classification in an efficient [3]. ASR has been affected negatively by loud and noisy environmental factors fuzzy phoneme [4], which create challenging issues and causes for ambiguous ASR

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.