Abstract

This work is aimed at enhancing the speaker‐independent performance of word‐based speech recognition systems by rapidly and automatically deducing general characteristics of the current speaker and using them to derive speaker‐normalizing transforms. DP matching is used to align and compare corresponding frames of the incoming speech and reference vocabulary. A single transform is then computed for all voiced speech and another for all unvoiced speech. The transform consist of a linear filtering component and, optionally, a constrained frequency shift. Experiments have been carried out with twenty male and female, native and non‐native English speakers each producing 150 digits. Adaptation on all 150 digits reduces recognition errors by a factor of three (4.5% to 1.5%). With adaptation on just three randomly selected digits, the reduction factor is two. Frequency shifting is useful only when the amount of adaptation material is large and the reference speech is not exclusively from the same sex as the current speaker. Best performance is obtained using a transform without frequency shifting and with all input and reference speech from the same sex. [Work supported by DCIEM, Department of National Defence, Canada.]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.