Abstract
Monaural source separation is often conducted by manipulating the amplitude spectrogram of a mixture (e.g., via time-frequency masking and spectral subtraction). The obtained amplitudes are converted back to the time domain by using the phase of the mixture or by applying phase reconstruction. Although phase reconstruction performs well for the true amplitudes, its performance is degraded when the amplitudes contain error. To deal with this problem, we propose an optimization-based method to refine both amplitudes and phases based on the given amplitudes. It aims to find time-domain signals whose amplitude spectrograms are close to the given ones in terms of the generalized alpha-beta divergences. To solve the optimization problem, the alternating direction method of multipliers (ADMM) is utilized. We confirmed the effectiveness of the proposed method through speech-nonspeech separation in various conditions.
Highlights
M ONAURAL source separation (MSS) aims to decompose a single-channel mixture signal into each source signal
Griffin–Lim algorithm (GLA) modifies the phase of each separated signal based on the short-time Fourier transform (STFT) consistency: the reconstructed complex STFT coefficient should retain the neighborhood relation caused by the overlapped window of STFT [12]
The multiple input spectrogram inversion (MISI) [13] further considered the mixture consistency [17]: a sum of separated signals should coincide with the mixture
Summary
M ONAURAL source separation (MSS) aims to decompose a single-channel mixture signal into each source signal. Due to the use of the mixed phase, the obtained signals contain interference even when the amplitudes are ideally separated. To tackle this problem, various phase reconstruction methods have been presented [10]–[16]. In MSS, the estimated amplitudes often contain error, which significantly impairs the performance of MISI This is because it keeps the given amplitudes and only attempts to reconstruct phases that are appropriate for the amplitudes in terms of STFT and mixture consistencies. The optimization problem aims to find the separated time-domain signals whose amplitude spectrograms are close to the given ones while considering the mixture consistency as a regularization. The effectiveness and robustness of the proposed method were confirmed by speech-nonspeech separation using various amplitude estimation methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.