Age estimation based on children's voice: a fuzzy-based decision fusion strategy.

Seyed Mostafa Mirhassani,Hua-Nong Ting,Alireza Zourmand

doi:10.1155/2014/534064

Seyed Mostafa Mirhassani, Hua-Nong Ting + Show 1 more

Open Access

https://doi.org/10.1155/2014/534064

Copy DOI

Journal: TheScientificWorldJournal	Publication Date: Jan 1, 2014
Citations: 41	License type: CC BY 3.0

Affiliation: University of Malaya

Abstract

Automatic estimation of a speaker's age is a challenging research topic in the area of speech analysis. In this paper, a novel approach to estimate a speaker's age is presented. The method features a “divide and conquer” strategy wherein the speech data are divided into six groups based on the vowel classes. There are two reasons behind this strategy. First, reduction in the complicated distribution of the processing data improves the classifier's learning performance. Second, different vowel classes contain complementary information for age estimation. Mel-frequency cepstral coefficients are computed for each group and single layer feed-forward neural networks based on self-adaptive extreme learning machine are applied to the features to make a primary decision. Subsequently, fuzzy data fusion is employed to provide an overall decision by aggregating the classifier's outputs. The results are then compared with a number of state-of-the-art age estimation methods. Experiments conducted based on six age groups including children aged between 7 and 12 years revealed that fuzzy fusion of the classifier's outputs resulted in considerable improvement of up to 53.33% in age estimation accuracy. Moreover, the fuzzy fusion of decisions aggregated the complementary information of a speaker's age from various speech sources.

Highlights

Speaker age has attracted considerable attention among researchers studying recent applications of speech processing
Speaker age provides valuable information that can improve the performance of automatic speech recognition (ASR) systems as well [1, 2]
single hidden layer feedforward neural-networks (SLFNs) trained by self-adaptive extreme learning machine (SaELM) are used for classification

Summary

Introduction

Speaker age has attracted considerable attention among researchers studying recent applications of speech processing. Speaker age provides valuable information that can improve the performance of automatic speech recognition (ASR) systems as well [1, 2]. Many systems that employ speech data demand a type of user adaptation system that can be adapted with the age of a user. In speech synthesis, the appropriate language model can be properly selected based on the age information of the speaker. In commercial applications such as advertising, the target age group can be effectively selected based on speaker’s age estimation. In ASR systems, the underlying model can be adaptively selected to improve the speech recognition rate

Methods

Results

Conclusion