Abstract
A method of speaker-independent connected-word recognition by robust segmentation for speaker variation is described. To normalize the variation by speakers, an input speech pattern is transformed through segmentation and labeling into a sequence of phonemically labeled segments (phoneme string) which have less variation by speakers. Connected word recognition is carried out using a two-level DP matching algorithm on that phoneme string. The input speech pattern is oversegmented in order to avoid omissions which cause fatal errors in word recognition. The number of segments which correspond to one phoneme should depend on the phoneme; the number of segments for vowels should be greater than that for consonants. From this viewpoint, we propose a method of varying the matching path adaptively with respect to each phoneme, at the dynamic-programming word-matching level. In experiments on spokenword recognition of one to four connected digits, the recognition rate for each word was about 90% and for each sequence of words was about 80%, on an average over seven male speakers. In the case where the words are spoken clearly, the former improved to 93.8% and the latter to 86.0% on an average.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.