Abstract

The paper presents the speaker normalization technique we implemented in a teaching and training system for hearing handicapped children with the goal to reduce inter-speaker variability in time-frequency speech representation. In an effort to reduce variance caused by variation in vocal tract shape among speakers, a formant based nonlinear frequency warping approach to vocal tract normalization is investigated. The proposed method can be efficiently realized in an Analysis by Synthesis framework. After the speech decomposition into the vocal tract envelope and excitation model, the vocal tract envelope is warped by the estimated frequency warping function, while the excitation characteristics are mapped to the reference speaker excitation. The results have shown significant spectral distance decrease for correctly pronounced words between test and the reference speaker after the normalization has been applied, while for poor pronunciation by the test speaker the spectral distance remains relatively high.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call