Abstract

AbstractThis paper presents a voice gender recognition system. Acoustic features and Mel-Frequency Cepstral Coefficients (MFCCs) are extracted to define the speaker's gender. The most used features in these kinds of studies are acoustic features, but in this work, we combined them with MFCCs to test if we will get more satisfactory results. To examine the performance of the proposed system we tried four different databases: the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Saarbruecken Voice Database (SVD), the CMU_ARCTIC database and the Amazigh speech database (Self-Created). At the pre-processing stage, we removed the silence from the signals by using Zero-Crossing Rate (ZCR), but we kept the noises. Support Vector Machine (SVM) is used as the classification model. The combination of acoustic features and MFCCs achieves an average accuracy of 90.61% with the RAVDESS database, 92.73% with the SVD database, 99.87% with the CMU_ARCTIC database and 99.95% with the Amazigh speech database. KeywordsSignal processingGender recognitionAcoustic featuresMel-Frequency Cepstral CoefficientsZero-crossing rateSupport Vector Machine

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call