Abstract

This paper further develops a previously proposed adaptation method for speech recognition called symbolic speaker adaptation (SSA). The basic idea of SSA is to model a speaker's pronunciation as a blend of speech varieties (SVs) - regional dialects and foreign accents - for which the system has existing pronunciation models. The system determines during an adaptation process the relative applicability of those models, yielding a speech variety profile (SVP) for each speaker. Speaker-dependent lexica for recognition are determined from a speaker's SVP. In this paper, we discuss a series of experiments designed to analyze how the SSA method is affected by SV-balanced training, expanded phone inventories, reduced amounts of adaptation data, and speech from SVs not modeled by the system. The most dramatic improvements were obtained by using expanded (SV-inclusive) phone inventories. SSA was also shown to be effective with a very small number of adaptation sentences. And, SSA's SV blending scheme yields higher accuracy than using a SV classification scheme for speakers of novel (unseen) SVs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.