Abstract

Modern voice recognition is the process of predicting the human voice using emergent artificial intelligence techniques. It is extensively adopted in real-time applications for identity verification, helping deaf/dumb people, electronic voice eavesdropping, and hearing-impairment purposes. Predicting salient and discriminative process flow in the voice recognition process is the most challenging task in identifying the voice. Some existing research works find it complex to provide better prediction results. However, this work concentrates on modelling an efficient approach to predict the voice using deep learning approaches and it includes four essential phases: data acquisition phase, input voice pre-processing phase, word segmentation phase and classification. For experimentation purpose, an available online dataset was initially acquired and fed as an input to the proposed model. Subsequently, the provided input is pre-processed using an improved Mel-DCT filter for voice activity detection and to improve the classifier performance. The pre-processed image is then supplied into the word segmentation phase where the words from the input data are segmented using the Grab Cut segmentation. Finally, the classification was done with the integrated Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). The integration is performed to handle the drawbacks of the single classifier model and to boost the prediction accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.