Children’s Age and Gender Recognition from Raw Speech Waveform Using DNN

Mousmita Sarma,Nagendra Kumar Goel,Kandarpa Kumar Sarma

doi:10.1007/978-981-15-2774-6_1

Abstract

We propose raw speech waveform-based end-to-end deep neural network (DNN) architectures to estimate age and gender of children within the age range of 4–14 years. To achieve this objective, we design single-task and multi-task learning DNN configuration. In the multi-task learning DNN, we use age and gender as separate label in two output layers and jointly optimize the total objective loss. We use a data-driven approach of learning feature from raw waveform within the DNN, which provides the learning process freedom to learn gender and age discriminative features during training. Interleaving time-delay neural network and long short-term memory (TDNN-LSTM) layers with time-restricted self-attention mechanism has been used for modeling of speech temporal dynamics. Experimental results provide a comparative analysis of single-task and multi-task learning process for age and gender recognition from children’s speech.

Full Text