Statistics of diphones and triphones presence on the word boundaries in the Polish language. Applications to ASR

Bartosz Ziolko,Piotr Zelasko,Dawid Skurzok

doi:10.1109/pvc.2014.6845418

Abstract

Recognition of continuous speech is one of the major challenges in automatic speech recognition (ASR), especially in phonetically complex languages (i.e. Polish). To improve ASR of the Polish language, we obtained phoneme statistics to locate diphones and triphones within the running speech sequences. We found that these clusters occur more likely between the words boundaries rather than within the word boundaries. Our research identified the most frequently appearing diphones and triphones in the natural speech corpus (Corpora) and we normalized these data for the Polish language at large. The results can be used in the various ASR application systems, i.e. by the speech recognizer module to enhance word boundaries recognitions, or to recognize non-dictionary words embedded in a natural sentence, (e.g. proper names).

Full Text