Abstract

The emotion accompanying with the voice is considered as a salient aspect in human communication. The effects of emotion in speech tend to alter the voice quality, timing, pitch and articulation of the speech signal. Gender classification, on the other hand, is an interesting field for psychologists to foster human-technology relationships. Automatic gender classification take on an increasingly ubiquitous role in myriad of applications, e.g., demographic data collection. An automatic gender classifier assists the development of improved male and female voice synthesizers (Childers et. al., 1988). Gender classification is also used to improve the speaker clustering task which is useful in speaker recognition. By separately clustering each gender class, the search space is reduced when evaluating the proposed hierarchical agglomerative clustering algorithm (Tranter and Reynolds, 2006). It also avoids segments having opposite gender tags being erroneously clustered together. Gender information is time-invariant, phoneme-independent, and identity-independent for speakers of the same gender (Wu & Childers, 1991). In (Xiaofan & Simske, 2004), an accent classification method is introduced on the top of gender classification. Vergin et al. (Vergin, 1996) claim that the use of gender-dependent acoustic-phonetic models reduces the word error rate of the baseline speech recognition system by 1.6%. In (Harb & Chen, 2005), a set of acoustic and pitch features along with different classifiers is tested for gender identification. The fusion of features and classifiers is shown to perform better than any individual classifier. A gender classification system is proposed in (Zeng et. al., 2006) based on Gaussian mixture models of speech features. Metze et al. have compared four approaches for age and gender recognition using telephone speech (Metze et. al., 2007). Gender cues elicited from the speech signal are useful in content-based multimedia indexing as well (Harb & Chen, 2005). Gender-dependent speech emotion recognizers have been shown to perform better than gender-independent ones for five emotional state (Ververidis & Kotropoulos, 2004; Lin & Wei, 2005) in DES (Engberg & Hansen, 1996). However, gender information is taken for granted there. The most closely related work to the present one is related to the research by Xiao et al. (Xiao et. al., 2007), where gender classification was incorporated in emotional speech recognition system using a wrapper approach based on back-propagation neural networks with sequential forward selection. An accuracy of 94.65% was reported for gender classification on the Berlin dataset (Burkhardt et. al., 2005). In this research, we employ several classifiers and assess their performance in gender classification by processing utterances from DES (Engberg & Hansen, 1996), SES (Sedaaghi, 2008) and GES (Burkhardt et. al., 2005) databases. They all contain affective speech. In O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.