Abstract

In this paper we evaluate different features for speech activity detection (SAD). Several signal processing techniques are used to derive acoustic features that capture attributes of speech useful in differentiating speech segments in noise. The acoustic features include short-term spectral features, long-term modulation features both derived using Frequency Domain Linear Prediction (FDLP), and joint spectro-temporal features extracted using 2D filters on a cortical representation of speech. Posteriors of speech and non-speech from a trained multi-layer perceptron are also used as data-driven features for this task. These feature extraction techniques form part of an elaborate feature extraction front-end where information spanning several hundreds of milliseconds of the signal are used along with heteroscedastic linear discriminant analysis for dimensionality reduction. Processed feature outputs from the proposed front-end are used to train SAD systems based on Gaussian mixture models for processing of speech from multiple languages transmitted over noisy radio communication channels under the ongoing DARPA Robust Automatic Transcription of Speech (RATS) program. The proposed front-end performs significantly better than standard acoustic feature extraction techniques in these noisy conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.