Stop consonant classification by dynamic formant trajectory

Yanli Zheng,Sarah Borys,Mark Hasegawa-Johnson

doi:10.21437/interspeech.2004-403

Abstract

LPC analysis is one of the most powerful techniques in speech analysis. Spectral zeros during consonant or consonant-vowel transition regions introduce difficulties in estimating LPC parameters. In this paper, we propose to estimate formant frequencies from LPC model by MUSIC (Multiple Signal Classification) and ESPRIT (Estimation of Signal Parameters via Rotational Invariance Techniques). Formant candidates estimated by LS (Least Square), MUSIC and ESPRIT are combined to find an optimal solution. The effectiveness of this algorithm is verified by place classification task of stop consonants. 1. OVERVIEW Classification of stop consonants remains one of the most challenging problems in speech recognition. Halberstadt (1998) [3] reported classification of phones in the TIMIT database using heterogeneous acoustic measurements, they found that: for vowel classification, listener-labeler error and machine error produce very similar performance; for place classification of stop consonants, machine classification results lag by a factor of 1.8 − 5.1. Sussman (1991) [9] investigated the locus equation and applied it to the place classification of stop consonants.They found that discriminant analysis using F2onset and F2vowel as predictors showed 76% classification accuracy. And they achieved 100% classification accuracy using derived slope and intercept values as predictors. It is generally agreed that relative invariance cues of stop consonant place are coded in dynamic spectral shape starting from stop release. Sussman’s result suggested that a compact representation of dynamic spectra could be found by accurate formant estimation. Formant frequencies are estimated from the LPC model. Spectral zeros during consonant or consonant-vowel transition regions introduce difficulties in estimating LPC parameters. This paper proposes an algorithm to improve formant estimation by combining formant candidates from different estimators. The rest of the paper is organized as follows: Section 2 reviews important properties of the LPC model; in Section 3, MUSIC and ESPRIT are proposed for formant estimation; in Section 4, an algorithm combining formant estimation of LS, MUSIC and ESPRIT is proposed, and the effectiveness of this algorithm is demonstrated by place classification of stop consonants. This work was supported by NSF award number 0132900. Statements in this paper reflect the opinions and conclusions of the authors, and are not endorsed by the NSF. 2. LPC MODEL Discrete-time speech production model can be described by [5]: Y (z) = G(z)T (z)R(z) (1) where G(z) is z-transform of source, R(z) is the radiation impedance, and T (z) is the transfer function of the vocal tract taking the form of an ARMA model. In the time-domain, for stationary process {yt} with E[yt] = 0, the ARMA model of Eq. (1) is: p

Full Text