Abstract

In large vocabulary word recognition systems, it is important to know the phonetic properties of the vocabulary. This paper presents observed phonetic properties of the 5000 most frequent words in the Brown Corpus, and describes a method to evaluate the effects of phoneme recognition errors on word recognition. The study was conducted for two cases: the 5000-word vocabulary with one standard pronunciation per word, and the same vocabulary with multiple pronunciations per word. A distance was defined as the number of different phoneme pairs between two words, taking phoneme deletion and insertion into account. The distance was calculated for every word pair using dynamic programming. Detailed analysis was made of word pairs with distances 0, 1, and 2, and some properties of the vocabulary were obtained which provide useful information in designing a word recognition system. Relations among phoneme recognition score, word recognition score, and vocabulary size were also investigated.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call