Single Channel Audio Source Separation Research Articles

This paper proposes a basis training algorithm for discriminative non-negative matrix factorization (NMF) with applications to single-channel audio source separation. With an NMF-based approach to supervised audio source separation, NMF is first applied to train the basis spectra of each source using training examples and then applied to the spectrogram of a mixture signal using the pretrained basis spectra at test time. The source signals can then be separated out using a Wiener filter. Here, a typical way to train the basis spectra is to minimize the dissimilarity measure between the observed spectrogram and the NMF model. However, obtaining the basis spectra in this way does not ensure that the separated signal will be optimal at test time due to the inconsistency between the objective functions for training and separation (Wiener filtering). To address this mismatch, a framework called discriminative NMF (DNMF) has recently been proposed. While this framework is noteworthy in that it uses a common objective function for training and separation, the objective function becomes more analytically complex than that of regular NMF. In the original DNMF work, a multiplicative update algorithm was proposed for the basis training; however, the convergence of the algorithm is not guaranteed and can be very slow. To overcome this weakness, this paper proposes a convergence-guaranteed algorithm for DNMF based on a majorization-minimization principle. Experimental results show that the proposed algorithm outperform the conventional DNMF algorithm as well as the regular NMF algorithm in terms of both the signal-to-distortion and signal-to-interference ratios.

Read full abstract

Harmonic model is widely used in single-channel audio source separation. It has been proven effective in music source separation problem, where the harmonic peaks among the sources differ greatly from each other. However, in analyzing a speech signal, the short time window always introduces the harmonic overlapping in the frequency domain. In order to overcome the shortcoming, we propose a long-short frame associated harmonic (LSAH) model to separate two speech sources from a single-channel recording. The long frame can achieve high harmonic resolution, while the short frame can ensure the short time stationary feature of the speech signal. They are jointly used to improve the accuracy of the multi-pitch estimation. Autocorrelation method is adopted to estimate the prominent pitch with simplicity and accuracy. LSAH model and the prominent pitch are proposed to judge the state of the mixture and estimate the other pitch candidate. Our method can guarantee both the high harmonic resolution and the short time stationarity of the speech signal. Furthermore, it can separate some unvoiced segments from the mixture which cannot be handled by many of the existed methods. Experiments on 30 groups of mixtures show that the proposed algorithm outperforms the standard short time harmonic model in terms of both signal-to-noise ratio (SNR) and subjective listening quality.

Read full abstract

Single Channel Audio Source Separation Research Articles

Articles published on Single Channel Audio Source Separation

Speech Enhancement using Convolutional Autoencoder Network

SFSRNet: Super-resolution for Single-Channel Audio Source Separation

Majorization-Minimization Algorithm for Discriminative Non-Negative Matrix Factorization

Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

A multiresolution non-negative tensor factorization approach for single channel sound source separation

Single-channel speech separation based on long–short frame associated harmonic model

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Single Channel Audio Source Separation Research Articles

Articles published on Single Channel Audio Source Separation

Speech Enhancement using Convolutional Autoencoder Network

SFSRNet: Super-resolution for Single-Channel Audio Source Separation

Majorization-Minimization Algorithm for Discriminative Non-Negative Matrix Factorization

Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks

On-the-Fly Audio Source Separation—A Novel User-Friendly Framework

A multiresolution non-negative tensor factorization approach for single channel sound source separation

Single-channel speech separation based on long–short frame associated harmonic model