Abstract

The current method for estimating frequency warping (FW) functions for vocal tract length normalization (VTLN) is by maximizing the ASR likelihood score by an exhaustive search over a grid of FW parameters. Exhaustive search is inefficient when estimating multi-parameter FWs, which have been shown to give improvements in recognition accuracy over single parameter FWs (J.W. McDonough, 2000). Here we develop a gradient search algorithm to obtain the optimal FW parameters for MFCC features, since previous work focussed on PLP cepstral features (J.W. McDonough, 2000). The novel calculation involved was that of the gradient of the Mel filterbank with respect to the FW parameters. Even for a single parameter, the gradient search method was more efficient than grid search by a factor of around 1.6 on the average for male children speakers tested on models trained from adult males. When used to estimate multi-parameter sine-log allpass transform (SLAPT, (J.W. McDonough, 2000)) FWs for VTLN, more than 50% reduction in word error rate was obtained with five parameter SLAPT compared to single-parameter piecewise linear FW

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call