Abstract

This paper proposed speech enhancement algorithm based on Markov random field (MRF) model for Thai, a tonal language. Firstly, a noisy speech signal is transformed using the short time Fourier transform (STFT). In so doing, noise is removed and speech is preserved, especially harmonics information as f0 patterns are relevant perceptual cues for lexical tones. The voice activity detector is used to classify each STFT time frame into voiced and unvoiced. Harmonics information is retrieved from each voiced time frame, where four neighborhoods of the analyzed STFT coefficients include its adjacent time frames (left, right) and nearest harmonics (top, bottom). For the unvoiced, four adjacent coefficients (left, right, top, and bottom) are used. A two-state MRF model is used to classify STFT coefficients into speech and noise. Those with speech state are retained, while the rest is set to zero. The enhanced speech is estimated by the inverse STFT. Results from quality evaluation test on four sets of Thai rhyming words corrupted by white noise at SNR levels of 0, 5, and 10 dB showed that the proposed algorithm significantly improved SNR of noisy speeches compared with spectral subtraction (1.3 dB on average) and Wiener filtering (1.9 dB on average).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call