Abstract

Currently, most speech recognition architectures model the speech signal as a nonoverlapping sequence of phonetic segments. A set of phonetic models is created that attempt to capture the acoustic-phonetic properties of individual phones, but do not explicitly model the transition between phones. It is readily apparent, however, that these transitions contain important information about the identity of neighboring phones. While context-dependent phonetic modeling may capture some of this information, it is likely that more explicit models of phonetic transitions could offer performance improvements. In this talk, the use of phonetic transition models will be discussed within the context of summit, a segment-based continuous speech recognition system [Zue etal., ‘‘Acoustic Segmentation and Phonetic Classification in the summit Speech Recognition System,’’ Proc. ICASSP 89, pp. 389–392, Glasgow, Scotland (1989)]. The transition models use a feature vector based on Mel-frequency spectral coefficients (MFSC’s). The vector is created by concatenating multiple spectral averages on both sides of a transition. For example, in one configuration a total of eight averages were used which spanned a total time interval of 150 ms. In order to reduce the number of dimensions, a principal component analysis was performed. A set of diagonal Gaussian models is used to model the transitions. The models were tested by applying them to the N-Best sentence hypotheses from the recognition system. Each of the N-Best hypotheses is rescored using a linear combination (optimized on training data) of the segment and transition scores. Initial experiments have resulted in 10%–20% reductions in word error rates.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.