Abstract
This paper proposes a noise robust voice activity detection (VAD) technique called PARADE (PAR based Activity DEtection) that employs the periodic component to aperiodic component ratio (PAR). Conventional noise robust features for VAD are still sensitive to non-stationary noise, which yields variations in the signal-to-noise ratio, and sometimes requires a priori noise power estimations, although the characteristics of environmental noise change dynamically in the real world. To overcome this problem, we adopt the PAR, which is insensitive to both stationary and non-stationary noise, as an acoustic feature for VAD. By considering both periodic and aperiodic components simultaneously in the PAR, we can mitigate the effect of the non-stationarity of noise. PARADE first estimates the fundamental frequencies of the dominant periodic components of the observed signals, decomposes the power of the observed signals into the powers of its periodic and aperiodic components by taking account of the power of the aperiodic components at the frequencies where the periodic components exist, and calculates the PAR based on the decomposed powers. Then it detects the presence of target speech signals by estimating the voice activity likelihood defined in relation to the PAR. Comparisons of the VAD performance for noisy speech data confirmed that PARADE outperforms the conventional VAD algorithms even in the presence of non-stationary noise. In addition, PARADE is applied to a front-end processing technique for automatic speech recognition (ASR) that employs a robust feature extraction method called SPADE (Subband based Periodicity and Aperiodicity DEcomposition) as an application of PARADE. Comparisons of the ASR performance for noisy speech show that the SPADE front-end combined with PARADE achieves significantly higher word accuracies than those achieved by MFCC (Mel-frequency Cepstral Coefficient) based feature extraction, which is widely used for conventional ASR systems, the SPADE front-end without PARADE, and other standard noise robust front-end processing techniques (ETSI ES 202 050 and ETSI ES 202 212). This result confirmed that PARADE can improve the performance of front-end processing for ASR.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have