Abstract

Speaker's age and gender classification is one of the most challenging problems in the field of acoustic recognition. Although many studies have been done to obtain better results, the classification accuracies are still not satisfactory. Motivated by the success in the deep learning techniques in speech processing field, we developed a DNN architecture to classify speakers' age and gender. In this work, we propose a new data model and we make modifications in GB-RBM and the BB-RBM architectures. This work shows that DNN can be trained by using age and gender real valued data and it will have a solid expressive and distinctive capability to classify the speaker's age and gender if the DNN is skillfully initialized and trained. Experimental results showed that the proposed DNN model and modifications achieved faster learning. To evaluate the proposed model, several experiments were conducted. The dimensional grey scale displays for the activation probability of hidden units, weights and biases histograms, and the MSE metrics were used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.