Abstract
Rather than the traditional approach to speech recognition of separating the acoustic observations into disjoint segments such as phonemes, this paper describes a structural approach to speech recognition that attempts to incorporate speech knowledge into the representation model. Here, the acoustic observations at any given moment are represented by a set of independent characteristics of the underlying physical system that produced the observations. The coarticulation problem can be viewed as the horizontal alignment of the characteristics as the underlying production system changes from one particular configuration to the next. Because the characteristics are independent, a flexible horizontal alignment is permitted, substantially reducing the coarticulation problem. This model is implemented using a large ergodic HMM in which each state represents one particular combination of phonetic features. The model is constrained to ensure that the feature values change in a manner consistent with the physical laws governing the mechanical generation system. Recognition results from the new system show a marked increase in recognition rate over the more conventional approach of modeling each phoneme separately. Analysis of the recognition errors also shows a substantially smaller frame error during the coarticulation period. This speech recognition model demonstrates a significantly different approach than has been previously pursued. Several issues raised by this model are dealt with including the coarticulation problem, model constraint mechanisms, selection of appropriate features, and model implementation. Results obtained so far are very positive, indicating that this model can actually improve on existing recognition rates. The reduced error during the coarticulation period points to numerous paths for further improvement of the model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.