Abstract

Automatic gender classification in speech is a challenging research field with a wide range of applications in HCI (human-computer interaction). A couple of decades of research have shown promising results, but there is still a need for improvement. Until now, gender classification has been made using differences in the spectral characteristics of males and females. We assumed that a neutral margin exists between the male and female spectral range. This margin causes misclassification of gender. To address this limitation, we studied three non-lexical speech features (fillers, overlapping, and lengthening). From the statistical analysis, we found that overlapping and lengthening are effective in gender classification. Next, we performed gender classification using overlapping, lengthening, and the baseline acoustic feature, Mel Frequency Cepstral Coefficient (MFCC). We have tried to achieve the best results by using various combinations of features at the same time or sequentially. We used two types of machine-learning methods, support vector machine (SVM) and recurrent neural networks (RNN), to classify the gender. We achieved 89.61% with RNN using a feature set including MFCC, overlapping, and lengthening at the same time. Also, we have reclassified using non-lexical features with only data belonging to the neutral margin which was empirically selected based on the result of gender classification with only MFCC. As a result, we determined that the accuracy of classification with RNN using lengthening was 1.83% better than when MFCC alone was used. We concluded that new speech features could be effective in improving gender classification through a behavioral approach, notably including emergency calls.

Highlights

  • It is difficult to identify the age, intention, emotion, and gender of a speaker from telephone calls [1]

  • In February 2015, we signed a memorandum of understanding (MOU) with the NEIA of Northern Gyeonggi province to cooperate with the development of technology for social security based on emergency calls

  • We present a method for gender classification of speech from emergency calls using the machine learning technique known as support vector machines (SVM)

Read more

Summary

Introduction

It is difficult to identify the age, intention, emotion, and gender of a speaker from telephone calls [1]. This information is considered essential for automatic speech recognition (ASR). Gender classification is especially useful in the field of ASR because specific acoustic models are applied for the process, which has been reported to improve performance [4]. These can be used in many fields, such as categorizing calls by gender (e.g., for surveys) [4,5]. In particular, it is instrumental in detecting the gender of the caller from the beginning of the call so that the call can be routed to the appropriate receiver according to the caller’s gender in order to calm the caller if necessary

Objectives
Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call