Shift-invariant features for speech activity detection in adverse radio-frequency channel conditions

Mohamed Kamal Omar,Sriram Ganapathy

doi:10.1109/icassp.2014.6854818

Abstract

This work presents a novel approach to speech activity detection for highly degraded radio-frequency channel conditions. In this approach, the audio stream is segmented into short homogeneous segments. Each segment is represented by shiftinvariant features. These features provide a coarse histogrambased description of the high-energy trajectories in the timefrequency domain. They are less sensitive to frequency shifting compared to traditional filterbank-based features like MelFrequency Cepstral Coefficients (MFCC) and Perceptual Linear Prediction (PLP) coefficients. We evaluate our approach on the speech activity detection task of the Robust Automatic Transcription of Speech (RATS) program. Our experiments show improvements up to 29% relative in the performance in terms of total error on four radio-frequency channels used in RATS compared to the PLP-based baseline system. Index Terms: speech activity detection, segmental modeling, invariant features

Full Text