Abstract

We present a speech denoising algorithm based on a regularized non-negative matrix factorization (NMF), in which several constraints are defined to describe the background noise in a generic way. The observed spectrogram is decomposed into four signal contributions: the voiced speech source and three generic types of noise. The speech signal is represented by a source/filter model which captures only voiced speech, and where the filter bases are trained on a database of individual phonemes, resulting in a small dictionary of phoneme envelopes. The three remaining terms represent the background noise as a sum of three different types of noise (smooth noise, impulsive noise and pitched noise), where each type of noise is characterized individually by imposing specific spectro-temporal constraints, based on sparseness and smoothness restrictions. The method was evaluated on the 3rd CHiME Speech Separation and Recognition Challenge development dataset and compared with conventional semi-supervised NMF with sparse activations. Our experiments show that, with a similar number of bases, source/filter modeling of speech in conjunction with the proposed noise constraints produces better separation results than sparse training of speech bases, even though the system is only designed for voiced speech and the results may still not be practical for many applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.