Abstract

ABSTRACT Analysing the human voice has always been a challenge to the engineering society for various purposes such as product review, emotional state detection, developing AI, and much more. Two basic grounds of voice or speech analysis are to detect human gender and the geographical region based on accent. This study presents a three-layer feature extraction method from the raw human voice to detect the gender as male or female, as well as the region from where that voice belongs. Fundamental frequency, spectral entropy, spectral flatness, and mode frequency have been calculated in the first layer of feature extraction. On the other hand, Mel Frequency Cepstral Coefficient has been used to extract the features in the second layer and linear predictive coding in the third layer. Regular voice contains some noises which have been removed with multiple audio data filtering processes to get noise-free smooth data. Multi-Output-based 1D Convolutional Neural Network has been used to recognize gender and region from a combined dataset which consists of TIMIT, RAVDESS, and BGC datasets. The model has successfully predicted the gender with 93.01% and region with 97.07% accuracy. This method works better than usual state-of-the-art methods in separate datasets along with the combined dataset on both gender and region classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call