Abstract

In this paper, a novel sparse representation over learned and exemplar dictionaries is explored to estimate the speech information of stressed speech. Stressed speech contains speech and stress informations. The acoustic variabilities are induced due to presence of stress information, which results in degradation of the performance of speech recognition system. In this work, the acoustic variabilities are reduced by representing both neutral and stressed speech in sparse domain with respect to the dictionaries, which contain speech information. K-SVD algorithm is used to learn the redundant dictionary using neutral speech. Exemplar dictionaries consist of mean vectors of GMM and mean vectors of Gaussian mixture density in each state of HMM, which are used to model the neutral speech. All the experiments in this work are done by parametrizing neutral and stressed speech as nonlinear (TEO-CB-Auto-Env) features. Experimental results indicate that speech information under stress conditions can be estimated efficiently when sparse representations of neutral and stressed speech are done over exemplar dictionaries, which is estimated using mean vectors of Gaussian mixture densities in each state of HMM i.e. time dependent features of neutral speech. A relative improvement in the percentage of word accuracy of 8.51% (62.14% to 67.43%) is achieved for speech under angry condition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call