Abstract

The strong uncorrelating transform (SUT) provides estimates of independent components from linear mixtures using only second-order information, provided that the components have unique circularity coefficients. We propose a processing framework for generating complex-valued subbands from real-valued mixtures of speech and noise where the objective is to control the likely values of the sample circularity coefficients of the underlying speech and noise components in each subband. We show how several processing parameters affect the noncircularity of speech-like and noise components in the subband, ultimately informing parameter choices that allow for estimation of each of the components in a subband using the SUT. Additionally, because the speech and noise components will have unique sample circularity coefficients, this statistic can be used to identify time–frequency regions that contain voiced speech. We give an example of the recovery of the circularity coefficients of a real speech signal from a two-channel noisy mixture at $-{\hbox {25}}~\hbox{dB}$ SNR, which demonstrates how the estimates of noncircularity can reveal the time-frequency structure of a speech signal in very high levels of noise. Finally, we present the results of a voice activity detection (VAD) experiment showing that two new circularity-based statistics, one of which is derived from the SUT processing, can achieve improved performance over state-of-the-art VADs in real-world recordings of noise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call