Age group classification and gender recognition from speech with temporal convolutional neural networks

Héctor A Sánchez-Hevia,Roberto Gil-Pita,Manuel Utrilla-Manso,Manuel Rosa-Zurera

doi:10.1007/s11042-021-11614-4

Héctor A Sánchez-Hevia, Roberto Gil-Pita + Show 2 more

Open Access

https://doi.org/10.1007/s11042-021-11614-4

Copy DOI

Journal: Multimedia Tools and Applications	Publication Date: Jan 1, 2022
Citations: 15	License type: open-access

Affiliation: University of Alcalá

Abstract

This paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%.

Highlights

This paper deals with the design of algorithms to classify speakers into age and gender groups, to be applied in call centres with ‘Interactive Voice Response’ (IVR) systems
Block-level results are obtained by evaluating one-second audio blocks independently, while file-level results are obtained by averaging the outputs of all blocks that make up each file before computing the metrics
These results indicate that when the system is implemented in a real IVR system, around 80 % of the callers will be routed to the correct specialised agent

Summary

Introduction

This paper deals with the design of algorithms to classify speakers into age and gender groups, to be applied in call centres with ‘Interactive Voice Response’ (IVR) systems. Most of the costs are spent on human resources, so a great effort has been made to optimise the use of agents, considering they are not homogenous, and have different experience and skills, handling customers requests with different speeds [40] For this purpose, different routing strategies are applied in IVR systems, like direct routing, self-service routing, skill-based routing, or data-directed routing. The prediction uses historical data from agents and customers, or other information that could be obtained from the customer voice, such as emotions, age and gender Both information extraction and best match prediction can be implemented with machine learning techniques [20]. This paper presents a framework to jointly identify speakers’ gender and classify them into age groups, designed to be applied in IVR systems.

Background on deep learning

Convolutional neural networks

Convolutional recurrent neural network

Temporal convolutional networks

Speech corpus

Experimental settings

Results and discussion

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Age group classification and gender recognition from speech with temporal convolutional neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Multimedia Tools and Applications

Lead the way for us

Similar Papers

Age and Gender Recognition from Speech Using Deep Neural Networks
Héctor A Sánchez-Hevia ... Roberto Gil-Pita
-
Héctor A Sánchez-Hevia, et. al.Héctor A Sánchez-Hevia ... Roberto Gil-Pita
03 Nov 2020
03 Nov 2020

Temporal Convolutional Attention Neural Networks for Time Series Forecasting
Yang Lin ... Irena Koprinska
-
Yang Lin, et. al.Yang Lin ... Irena Koprinska
18 Jul 2021
18 Jul 2021

Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods
Zhijun Chen ... Shijun Sun
Journal of Hydrology | VOL. 591
Zhijun Chen, et. al.Zhijun Chen ... Shijun Sun
14 Jul 2020
Journal of Hydrology | VOL. 591

Multimedia event detection via deep spatial-temporal neural networks
Jingyi Hou ... Yunde Jia
-
Jingyi Hou, et. al.Jingyi Hou ... Yunde Jia
01 Jul 2016
01 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Age group classification and gender recognition from speech with temporal convolutional neural networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Multimedia Tools and Applications