Abstract

One of the basic problems in Automatic Recognition of Continuous Speech exists relative to the representation of talker performance. Phonetically, the problem results in a requirement at the lexical level for either “narrow” transcriptions or a variety of “broad” transcriptions for representing any word. If pattern classification procedures are utilized in acoustic analysis, the former presents many problems involving inventory size. More importantly, a question arises concerning the appropriateness of the referent employed in performance evaluation. For speech, the referent—a description of what the talker actually produced rather than intended—is, even with a highly trained phonetician, many times difficult to obtain. It is clear that a single “ideal” transcription for each lexical item is inappropriate. Traditionally, in speech recognition work involving phonetic classification, a single transcription is provided for each of the words to be recognized. Systems have been proposed in which partial representation of “phonemes” may provide sufficient information for more sophisticated, i.e., context dependent, processing at a later time. However, it is not clear that these systems will be able to account for some of the dissimilatory, assimilatory, and/or ideolectical variants that a phonetician would certainly use in describing talker performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.