Abstract
This paper describes the application of a one-stage Dynamic Programming (DP) algorithm to the acoustic-phonetic decoding stage in a speech recognition system. Essentially two methods are compared: (1) the recognition of demisyllable. and (2) the recognition of consonant cluster and syllabic nuclei (vowels or diphthongs). Demisyllables, consonant clusters and vowels or diphthongs have proved their usefulness in the recognition of continuous speech. Furthermore, the method of Dynamic Programming is a well established principle for the recognition of connected words. In this paper a stage of acoutic-phonetic decoding is presented which applies Dynamic Programming to these phonetic units. The algorithm used is essentially the same as the one commonly known for connected word recognition, which does not need explicit segmentation. However, for the recognition of demisyllables, consonant clusters and vowels, some specific modifications were necessary, including the introduction of a special internal syntax between the units. The derived algorithm was tested using fluently spoken sentences from 75 word lexicon in which the most frequent consonant clusters and vowels of the German language were represented. Different tests were made in order to get a comparison of the recognition accuracy using a context-dependent application (concerning neighbouring units) and a context-independent application. These results and a comparison to a similar acoustic-phonetic stage using an explicit segmentation are presented.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have