Gender and region detection from human voice using the three-layer feature extraction method with 1D CNN

Mohammad Amaz Uddin,Refat Khan Pathan,Md Sayem Hossain,Munmun Biswas

doi:10.1080/24751839.2021.1983318

Mohammad Amaz Uddin, Refat Khan Pathan + Show 2 more

Open Access

https://doi.org/10.1080/24751839.2021.1983318

Copy DOI

Abstract

ABSTRACT Analysing the human voice has always been a challenge to the engineering society for various purposes such as product review, emotional state detection, developing AI, and much more. Two basic grounds of voice or speech analysis are to detect human gender and the geographical region based on accent. This study presents a three-layer feature extraction method from the raw human voice to detect the gender as male or female, as well as the region from where that voice belongs. Fundamental frequency, spectral entropy, spectral flatness, and mode frequency have been calculated in the first layer of feature extraction. On the other hand, Mel Frequency Cepstral Coefficient has been used to extract the features in the second layer and linear predictive coding in the third layer. Regular voice contains some noises which have been removed with multiple audio data filtering processes to get noise-free smooth data. Multi-Output-based 1D Convolutional Neural Network has been used to recognize gender and region from a combined dataset which consists of TIMIT, RAVDESS, and BGC datasets. The model has successfully predicted the gender with 93.01% and region with 97.07% accuracy. This method works better than usual state-of-the-art methods in separate datasets along with the combined dataset on both gender and region classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Information and Telecommunication	Publication Date: Oct 7, 2021
Citations: 8	License type: open-access

R Discovery Prime

R Discovery Prime

Gender and region detection from human voice using the three-layer feature extraction method with 1D CNN

Abstract

Talk to us

Similar Papers

More From: Journal of Information and Telecommunication

Lead the way for us

Similar Papers

Real-time prediction of upcoming respiratory events via machine learning using snoring sound signal.
Bochun Wang ... Ji Wu
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 17
Bochun Wang, et. al.Bochun Wang ... Ji Wu
12 Apr 2021
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 17

Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms.
Douglas E Soltis ... Sara B Hoot
Systematic Biology | VOL. 47
Douglas E Soltis, et. al.Douglas E Soltis ... Sara B Hoot
01 Mar 1998
Systematic Biology | VOL. 47

Improved Speaker Recognition for Degraded Human Voice using Modified-MFCC and LPC with CNN
Amit Moondra ... Poonam Chahal
International Journal of Advanced Computer Science and Applications | VOL. 14
Amit Moondra, et. al.Amit Moondra ... Poonam Chahal
01 Jan 2023
International Journal of Advanced Computer Science and Applications | VOL. 14

Analysis of Performance Improvement for Speaker Verification by Combining Feature Vectors of LPC Spectral Envelope, MFCC and pLPC Pole Distribution
Haruki Shigeta ... Shuichi Kurogi
-
Haruki Shigeta, et. al.Haruki Shigeta ... Shuichi Kurogi
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gender and region detection from human voice using the three-layer feature extraction method with 1D CNN

Abstract

Talk to us

Similar Papers

More From: Journal of Information and Telecommunication