Language Independent Gender Identification from Raw Waveform Using Multi-Scale Convolutional Neural Networks

Krishna D.N,Sai Sumith Reddy,Amrutha D,Prabhu Aashish Garapati,Anudeepa Acharya,Triveni B.J

doi:10.1109/icassp40776.2020.9054738

Abstract

In this work, we propose a raw waveform based multiscale convolution neural network approach for language-independent gender identification. Our approach uses raw audio waveform as input to the 1-dimensional multi-scale convolutional neural network instead of handcrafted feature for speaker gender classification. The multi-scale CNN has the advantage of using filters of different sizes on the audio waveform to extract features from raw waveform. We have a 3 stream CNN network where each stream contains multiple Residual blocks and we combine all the features from all streams after the last convolution layer to predict the gender label. Our gender identification dataset contains 176Hrs of audio data from 6 Indian languages(Hindi, English, Kannada, Telugu, Tamil, and Gujarati). Our experiments show that learning a gender identification task using a raw waveform gives better performance and speed up during training. Our experiments show that using multi-scale CNN on the raw waveform outperforms the spectrogram based model by an absolute improvement of 2.24%.

Full Text