Dictionary-based methods are recognized for their ability to estimate clear speech from speech contaminated with noise. However, these techniques face a limitation in distinguishing between speech and noise components. The presented algorithm addresses this challenge by introducing an innovative solution through an iterative thresholding method. In this method, the thresholding is based on noise characteristics and remains independent of variations in noise power. To determine the stopping criteria for thresholding, the algorithm leverages the structure of the speech signal in time–frequency domains using the Gini Index. Remarkably, this technique adeptly extracts reliable signal components from a noisy time–frequency magnitude spectrum, performing well under non-varying and varying Signal-to-Noise Ratio (SNR) conditions. Importantly, it achieves this without the need for mask functions, voice activity detection techniques, noise or mixture dictionaries, or SNR information, which are essential components in other dictionary-based methods. Following the thresholding process, the clean speech is assessed using a dictionary-based approach to restore perceptual loss. In evaluating its performance, the algorithm is compared with traditional enhancement techniques concerning perceptual evaluation of speech quality and short-time objective intelligibility. The assessment is conducted under various background noise conditions, including babble, white, factory, and Volvo noises. The proposed algorithm exhibits superior performance, particularly in low and varying SNR scenarios.
Read full abstract