The intended speech must be dealt with in the process of speech communication while under the impact of noise experienced in a variety of situations that degrade speech intelligibility and quality. This work proposes a multiple sub-frames analysis for the elimination of noise variants with compensation of the magnitude and phase spectrum from speech degraded by noise. The clean speech samples are extracted from the ITU-T recommended dataset at a 16 kHz sampling rate and down-sampled to an 8 kHz sampling rate. The noise signal variants are added from the AURORA and NOISEX-92 datasets at diverse input SNR levels (0 dB, 5 dB, 10 dB, 15 dB). The duration of window frames is chosen to be 25 msec in length, together with a shift percentage of 40%, to maintain the continuous nature of frames in speech. The smoothing factor for noise updating in a specific sub-frame is set to 9, and the spectral floor parameter for determining the precise amount of noise elimination in the corrupted spectrum is set to 0.03. The phase spectrum is compensated by incorporating a recompense function that is updated in combination with the sub-frame analysis. The accomplishment of the suggested approach is assessed with regard to objective metrics, including Segmental-Signal-to-Noise-Ratio (SegSNR), Mean-Square-Error (MSE), and Perceptual-Evaluation-of-Speech-Quality (PESQ) scores corresponding to specific sub-frames of speech, respectively. To further analyze the improved quality, simple listening assessment and spectrogram analysis are incorporated, followed by a comparative investigation with prior noise-suppressive algorithms on the corrupted speech corpus.
Read full abstract