Humans prefer to convey information through speech utilizing similar language. Speech detection is the capability to recognize the spoken words of the speaking person. Recent work demonstrates the increased attention among researcher workers in this field specially in Brain-Like computing applications and emphasizes the real-world usability of speech for speaker recognition across different applications. Automatic speech recognition (ASR) is the method of identifying human speech and converting it into text. This study has gained much popularity in recent times. It is a crucial area of research for human-to-machine interaction. Pioneer methods are concerned with manual feature extraction and classical algorithms including Hidden Markov Models (HMM), Gaussian Mixture Model (GMM), and the Dynamic Time Warping (DTW) model. In recent years, neural networks, namely convolutional neural networks (CNN), recurrent neural networks (RNN), and Transformers, have been utilized in the context of ASR and reached outstanding performance over the past few years. This study introduces Intelligent Speech Recognition using the Fractal Amended Grasshopper Optimization Algorithm with Deep Learning (ISR-AGODL) approach. The presented ISR-AGODL technique correctly identifies and recognizes speech signals. In the ISR-AGODL technique, the speech signals are transformed into spectrograms. Besides, the features are derived using the deep convolutional neural networks (DCNN) model. Followed by the Fractals AGO technique is utilized for the choosing of hyperparameters. Finally, the recognition of speech signals is achieved using the extreme gradient boosting (XGBoost) model. The simulation outcomes of the ISR-AGODL method can be validated using a benchmark dataset. The experimental results of the ISR-AGODL method portrayed a superior accuracy outcome of 96.34% over other models.
Read full abstract