Abstract

For the purpose of realizing an effective use of prosodic features in automatic speech recognition, a method was proposed to check the suitability of a recognition candidate through its fundamental frequency contour. In this method, a fundamental frequency contour is generated for each recognition candidate and compared with the observed contour. The generation of fundamental frequency contours is conducted based on prosodic rules formerly developed for text-to-speech conversion, and the comparison is performed only on the portion with recognition ambiguity, by a newly developed scheme denominated partial analysis-by-synthesis. The candidate giving the contour that best matches the observed contour is selected as the final recognition result. The method was shown to be valid for detecting recognition errors accompanied by changes in accent types and/or syntactic boundaries, and was also evaluated as to its performance for detecting phrase boundaries. The results indicated that it can detect boundaries correctly or at least with a location error of one mora.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call