Abstract
The HEARSAY speech understanding system uses acoustic-phonetic knowledge at basically four levels: lexical, syllabic, phonemic, and subsegmental. The lexicon is a list of phonemic spellings of all the words of the given language, possibly with alternates. The system recognizes possible word juncture problems from the phonemic dictionary; for example, the juncture between the words “got to” may be realized with one or two stop releases, and the system is sensitive to both possibilities. At a (roughly) syllabic level, the speech is segmented into the three classes of “voiced,” “silent,” and “fricated” (voiced or unvoiced). (In addition, a “voiced” segment is further subdivided if it contains a significant amplitude minimum between significant maxima.) This segmentation, while crude, is very reliable for many first-lever approximations. At the phonemic level, the system matches an expectation derived from the dictionary spelling to the actual utterance; each candidate word is rated as a function of the goodness of this phonemic-phonetic match. At the subsegmental level, a label is assigned to each 10-msec sample of speech. The inventory of labels is intended to correspond to the steady-state portions of the sustained phone types of English. This labeling is used to characterize each 10 msec for the further higher-level processing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.