Abstract

Recognition of spoken names is an important ASR task since many speech applications can be associated with it. However, the task is also among the most difficult ones due to the large number of names, their varying origins, and the multiple valid pronunciations of any given name, largely dependent upon the speaker’s mother tongue and familiarity with the name. In order to explore the speaker- and language-dependent pronunciation variability issues present in name pronunciation, a spoken name database was collected from 101 speakers with varying native languages. Each speaker was asked to pronounce 80 polysyllabic names, uniformly chosen from ten language origins. In preliminary experiments, various prosodic features were used to train Gaussian mixture models (GMMs) to identify misplaced syllabic emphasis within the name, at roughly 85% accuracy. Articulatory features (voicing, place, and manner of articulation) derived from MFCCs were also incorporated for that purpose. The combined prosodic and articulatory features were used to automatically grade the quality of name pronunciation. These scores can be used to provide meaningful feedback to foreign language learners. A detailed description of the name database and some preliminary results on the accuracy of detecting misplaced stress patterns will be reported.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.