Abstract

Over the past several years, I have been conducting research on subword modeling in speech recognition. The research is most specifically aimed at the difficult task of identifying and characterizing unknown words, although the proposed framework also has utility in other recognition tasks such as phonological and prosodic modeling. The approach exploits the linguistic substructure of words by describing graphemic, phonemic, phonological, syllabic, and morphemic constraints through a set of context-free rules, and supporting the resulting parse trees with a corpus-trained probability model. A derived finite state transducer representation forms a natural means for integrating the trained model into a recognizer search. This paper describes several research projects I have been engaged in, together with my students and associates, aimed at exploring ways in which recognition tasks can benefit from such formal modeling of word substructure. These include phonological modeling, hierarchical duration modeling, sound-to-letter and letter-to-sound mapping, and automatic acquisition of unknown words in a speech understanding system. Results of several experiments in these areas are summarized here.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call