Abstract

We describe a dimensionality reduction method for modulation spectral features, which keeps the time-varying information of interest to the classification task. Due to the varying degrees of redundancy and discriminative power of the acoustic and modulation frequency subspaces, we first employ a generalization of SVD to tensors (Higher Order SVD) to reduce dimensions. Projection of modulation spectral features on the principal axes with the higher energy in each subspace results in a compact feature set. We further estimate the relevance of these projections to speech discrimination based on mutual information to the target class. Reconstruction of modulation spectrograms from the “best” 22 features back to the initial dimensions, shows that modulation spectral features close to syllable and phoneme rates as well as pitch values of speakers are preserved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.