Abstract

Nowadays the interaction between humans and machines is quite possible and friendly because of the speech recognition system. The gender identification system has been used in many fields like security systems, robotics, artificial intelligence, call center, etc. This paper narrates a novel method to extract the features from audio speech to recognize gender as male or female. At first, we have done data pre-processing to get the noise-free smooth data. Then used this pre-processed data in a multi-layer architecture model to extract the features. In the first layer, we have calculated fundamental frequency using autocorrelation function, spectral entropy, spectral flatness and mode frequency. In the second layer, we have used linear interpolation function to map the pre-processed data into a suitable range and used the Mel Frequency Cepstral Coefficient (MFCC) to extract the features from these mapped data. Three different datasets: TIMIT, RAVDESS, and BGC (Self-Created) and two machine learning classifiers: K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) have been used to substantiate the accuracy of the proposed model. We acquired the highest 96.8% accuracy for TIMIT Dataset with KNN comparing with the other two datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call