Abstract

A problem encountered in speech synthesis using concatenation methods is the unnaturalness and lowered intelligibility which arise from reducing the duration of stored units (diphones, demisyllables, words, etc.). in word-concatenation systems, this problem is most severe in the case of a sequence of function words, where an extreme amount of contraction is necessary in order to approach natural-speech prosody. We suggest this problem can be circumvented by storing not only isolated words but also certain sequences of words. To this end, we studied the frequency of occurrence of two- and three-word sequences in English, based on the million-word corpus of Kucera & Francis. We find that many sequences of extremely common words occur more frequently than all but the most frequent single words. For example, of the is one of the ten most common “words” of English; 17 two-word sequences have corpus frequencies greater than 1000, making them more common than much, well, should, how, etc. We suggest that the results of our study allow one to properly select a “word”-concatenation vocabulary. Our word-sequence frequency tables should be useful to psychologists and workers in speech recognition as well.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call