Abstract
In large vocabulary word recognition systems, it is important to know the phonetic properties of the vocabulary. This paper presents observed phonetic properties of the 5000 most frequent words in the Brown Corpus, and describes a method to evaluate the effects of phoneme recognition errors on word recognition. The study was conducted for two cases: the 5000-word vocabulary with one standard pronunciation per word, and the same vocabulary with multiple pronunciations per word. A distance was defined as the number of different phoneme pairs between two words, taking phoneme deletion and insertion into account. The distance was calculated for every word pair using dynamic programming. Detailed analysis was made of word pairs with distances 0, 1, and 2, and some properties of the vocabulary were obtained which provide useful information in designing a word recognition system. Relations among phoneme recognition score, word recognition score, and vocabulary size were also investigated.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.