Abstract

Time-frequency masking combined with spectral phase estimation may be useful for recovering the intelligibility and quality of speech degraded by background noises. This unsupervised speech enhancement method has an immense ability to decrease the noise in nonstationary and difficult noisy backgrounds. The proposed method replaces the spectral phase of noisy speech with an estimated spectral phase and merges with the novel time-frequency mask during signal reconstruction. Variance-based features are extracted to estimate the time-frequency mask and are then passed over an unsupervised and nonparametric adaptive threshold. The extracted features satisfying the threshold condition are retained, whereas the violating features are discarded. The estimated time-frequency mask is used to obtain enhanced speech. During phase estimation for signal reconstruction, the noisy phase is decomposed into the spectrum of the instantaneous noisy phase trailed by temporal smoothing to decrease variations. Results show considerable improvements in terms of short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), segmental signal-to-noise ratio (SSNR), and speech distortion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call