Abstract

In this paper, a novel subspace projection-based approach is explored to analyze the stressed speech. Under stress, the phonetic and the speaker specific attributes exhibit a higher variance in comparison to the neutral speech. This degrades the discrimination capability of an automatic speech recognition (ASR) system trained on neutral speech when tested on stressed speech. To address the aforementioned mismatch, the neutral and the stressed speech are projected onto the another subspace of lower dimension. The applied subspace projection matrix is learned on using the neutral speech employing linear discriminant analysis. Since LDA-based projection are learned by minimizing the between and within class scatter, subspace projection is expected to reduce the variance mismatch. The effect of low-rank subspace projection is explored on ASR systems employing acoustic models based on Gaussian mixture model (GMM) as well as deep neutral network (DNN). Projecting the training and test data is found to result in significant improvements in both the modeling paradigms. To further improve the system performance, speaker normalization is done using feature-space maximum-likelihood linear regression (fMLLR). The sub-space projection is found to result in additive improvements when combined with fMLLR. All the discussed studies are done using two different front-end speech parametrization techniques, viz. mel-frequency cepstral coefficients and TEO-CB-Auto-Env features. Consistent improvements are noted in both the cases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call