Abstract

This paper proposes a basis training algorithm for discriminative non-negative matrix factorization (NMF) with applications to single-channel audio source separation. With an NMF-based approach to supervised audio source separation, NMF is first applied to train the basis spectra of each source using training examples and then applied to the spectrogram of a mixture signal using the pretrained basis spectra at test time. The source signals can then be separated out using a Wiener filter. Here, a typical way to train the basis spectra is to minimize the dissimilarity measure between the observed spectrogram and the NMF model. However, obtaining the basis spectra in this way does not ensure that the separated signal will be optimal at test time due to the inconsistency between the objective functions for training and separation (Wiener filtering). To address this mismatch, a framework called discriminative NMF (DNMF) has recently been proposed. While this framework is noteworthy in that it uses a common objective function for training and separation, the objective function becomes more analytically complex than that of regular NMF. In the original DNMF work, a multiplicative update algorithm was proposed for the basis training; however, the convergence of the algorithm is not guaranteed and can be very slow. To overcome this weakness, this paper proposes a convergence-guaranteed algorithm for DNMF based on a majorization-minimization principle. Experimental results show that the proposed algorithm outperform the conventional DNMF algorithm as well as the regular NMF algorithm in terms of both the signal-to-distortion and signal-to-interference ratios.

Highlights

  • S INGLE-channel audio source separation is a challenging task of extracting individual source signals from a monaural recording of a mixture signal

  • negative matrix factorization (NMF) is applied to the spectrogram of a test mixture signal, where each subset of the basis spectra is fixed at the pretrained spectra

  • discriminative NMF (DNMF) is noteworthy in that it directly uses the reconstruction errors of separated signals as the training criteria, which eliminates the inconsistency between the objctive functions for training and separation in the conventional NMF method and can increase the discriminative power of the trained basis

Read more

Summary

Introduction

S INGLE-channel audio source separation is a challenging task of extracting individual source signals from a monaural recording of a mixture signal. One successful approach for monaural audio source separation involves applications of non-negative matrix factorization (NMF) [6], [10]. The basic idea of the NMF approach is to interpret the observed magnitude (or power) spectrogram of a signal as a non-negative matrix and factorize it into the product of non-negative matrices. This amounts to approximating the observed spectra by a linear sum of basis spectra scaled by time-varying amplitudes. In a supervised/semi-supervised source separation problem setting, NMF is first used to train the basis spectra of each sound source using individually recorded audio samples. A typical way to train the basis spectra of each source is to minimize a divergence measure between the NMF model and the spectrogram of the training samples

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.