Abstract

Sparse coding, as a successful representation method for many signals, has been recently employed in speech enhancement. This paper presents a new learning-based speech enhancement algorithm via sparse representation in the wavelet packet transform domain. We propose sparse dictionary learning procedures for training data of speech and noise signals based on a coherence criterion, for each subband of decomposition level. Using these learning algorithms, self-coherence between atoms of each dictionary and mutual coherence between speech and noise dictionary atoms are minimized along with the approximation error. The speech enhancement algorithm is introduced in two scenarios, supervised and semi-supervised. In each scenario, a voice activity detector scheme is employed based on the energy of sparse coefficient matrices when the observation data is coded over corresponding dictionaries. In the proposed supervised scenario, we take advantage of domain adaptation techniques to transform a learned noise dictionary to a dictionary adapted to noise conditions captured based on the test environment circumstances. Using this step, observation data is sparsely coded, based on the current situation of the noisy space, with low sparse approximation error. This technique has a prominent role in obtaining better enhancement results particularly when the noise is non-stationary. In the proposed semi-supervised scenario, adaptive thresholding of wavelet coefficients is carried out based on the variance of the estimated noise in each frame of different subbands. The proposed approaches lead to significantly better speech enhancement results in comparison with the earlier methods in this context and the traditional procedures, based on different objective and subjective measures as well as a statistical test.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call