Abstract

One of the biggest difficultiesin automatic speech recognition (ASR) is how to deal with variations of speech signals caused by non-linguistic information, such as age, gender, etc. Various methods have been proposed to compensate for the variations and one of them is speech structure [1]. Speech structure, which extracts only contrastive features and discards absolute features, is proved to be transform-invariant mathematically and to be very robust with the non-linguistic variations experimentally [2]. Although the conventional speech structure extracts local and distant contrastive features, it did not extract dynamic features explicitly which are supposed to exist in the contrastive features. In this paper, we reformulate speech structure based on trajectory HMM and derive trajectory structure (TSR), in which dynamic and contrastive features can be defined and used in ASR. We carry out an experiment of n-best rescoring of isolated word recognition using trajectory structure and obtain 28.5% relative decrease in word error rate.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call