Abstract

In the digital era, it is undeniable that voice classification plays a meaningful task in various aspects of life. In this research, we propose a method of predicting the gender and region of the Vietnamese voice which is based on the spectrum of sound using the deep learning approach. From the raw dataset, we conducted the preprocessing stage to take the audio dataset to the same frequency and time standard. After that, we extracted Mel Spectrogram feature and then put into a deep learning model - Convolutional Neural Network to train and optimize. Our experiments on 37 samples taken from VIVOS corpus audio dataset achieve the accuracy of 86.48% for predicting gender and 51.45% for predicting the region of the voice

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.