Abstract

One of the basic problems in Automatic Recognition of Continuous Speech exists relative to the representation of talker performance. Phonetically, the problem results in a requirement at the lexical level for either “narrow” transcriptions or a variety of “broad” transcriptions for representing any word. If pattern classification procedures are utilized in acoustic analysis, the former presents many problems involving inventory size. More importantly, a question arises concerning the appropriateness of the referent employed in performance evaluation. For speech, the referent—a description of what the talker actually produced rather than intended—is, even with a highly trained phonetician, many times difficult to obtain. It is clear that a single “ideal” transcription for each lexical item is inappropriate. Traditionally, in speech recognition work involving phonetic classification, a single transcription is provided for each of the words to be recognized. Systems have been proposed in which partial representation of “phonemes” may provide sufficient information for more sophisticated, i.e., context dependent, processing at a later time. However, it is not clear that these systems will be able to account for some of the dissimilatory, assimilatory, and/or ideolectical variants that a phonetician would certainly use in describing talker performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call