Gender recognition from speech. Part II: Fine analysis

D G Childers,Ke Wu

doi:10.1121/1.401664

Abstract

The purpose of this research was to investigate the potential effectiveness of digital speech processing and pattern recognition techniques for the automatic recognition of gender from speech. In part I Coarse Analysis [K. Wu and D. G. Childers, J. Acoust. Soc. Am. 90, 1828-1840 (1991)] various feature vectors and distance measures were examined to determine their appropriateness for recognizing a speaker's gender from vowels, unvoiced fricatives, and voiced fricatives. One recognition scheme based on feature vectors extracted from vowels achieved 100% correct recognition of the speaker's gender using a database of 52 speakers (27 male and 25 female). In this paper a detailed, fine analysis of the characteristics of vowels is performed, including formant frequencies, bandwidths, and amplitudes, as well as speaker fundamental frequency of voicing. The fine analysis used a pitch synchronous closed-phase analysis technique. Detailed formant features, including frequencies, bandwidths, and amplitudes, were extracted by a closed-phase weighted recursive least-squares method that employed a variable forgetting factor, i.e., WRLS-VFF. The electroglottograph signal was used to locate the closed-phase portion of the speech signal. A two-way statistical analysis of variance (ANOVA) was performed to test the differences between gender features. The relative importance of grouped vowel features was evaluated by a pattern recognition approach. Numerous interesting results were obtained, including the fact that the second formant frequency was a slightly better recognizer of gender than fundamental frequency, giving 98.1% versus 96.2% correct recognition, respectively. The statistical tests indicated that the spectra for female speakers had a steeper slope (or tilt) than that for males. The results suggest that redundant gender information was imbedded in the fundamental frequency and vocal tract resonance characteristics. The feature vectors for female voices were observed to have higher within-group variations than those for male voices. The data in this study were also used to replicate portions of the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] study of vowels for male and female speakers.

Full Text