Abstract

A method was proposed for posterior use of prosodic features to ensure correct recognition and to detect recognition errors. Fundamental frequency contours (F0 contours) are generated for recognition hypotheses using the prosodic rules developed for speech synthesis and are compared with the observed contour. Partial analysis-by-synthesis absorbs unexpected variations in the observed contour. This method can detect recognition errors accompanied by accent type changes and/or syntactic boundary shifts. While syntactic boundaries are useful for speech recognition, detecting them based on prior use of F0 contours is sometimes rather hard since they are less marked in the F0 contours. Therefore, the method was evaluated to determine how well it can detect syntactic boundaries using pitch information. Preliminary results given by K. Hirose and A. Sakurai [Proc. ICASSP-96, 809–812 (1996)] were further validated on the ATR continuous speech conference registration database, which includes 37 major syntactic boundaries (not preceded by long pauses but) accompanied by F0 rises reflecting to phrase components. The method identifies these boundaries with 92% accuracy within 2-mora position error, and, within 1-mora position error, with 86% accuracy. Discussion will extend to augmenting this method using statistical techniques such as HMMs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call