Multiple Speech Source Separation Using Inter-Channel Correlation and Relaxed Sparsity

Maoshen Jia,Jundai Sun,Xiguang Zheng

doi:10.3390/app8010123

Maoshen Jia, Jundai Sun + Show 1 more

Open Access

https://doi.org/10.3390/app8010123

Copy DOI

Abstract

In this work, a multiple speech source separation method using inter-channel correlation and relaxed sparsity is proposed. A B-format microphone with four spatially located channels is adopted due to the size of the microphone array to preserve the spatial parameter integrity of the original signal. Specifically, we firstly measure the proportion of overlapped components among multiple sources and find that there exist many overlapped time-frequency (TF) components with increasing source number. Then, considering the relaxed sparsity of speech sources, we propose a dynamic threshold-based separation approach of sparse components where the threshold is determined by the inter-channel correlation among the recording signals. After conducting a statistical analysis of the number of active sources at each TF instant, a form of relaxed sparsity called the half-K assumption is proposed so that the active source number in a certain TF bin does not exceed half the total number of simultaneously occurring sources. By applying the half-K assumption, the non-sparse components are recovered by regarding the extracted sparse components as a guide, combined with vector decomposition and matrix factorization. Eventually, the final TF coefficients of each source are recovered by the synthesis of sparse and non-sparse components. The proposed method has been evaluated using up to six simultaneous speech sources under both anechoic and reverberant conditions. Both objective and subjective evaluations validated that the perceptual quality of the separated speech by the proposed approach outperforms existing blind source separation (BSS) approaches. Besides, it is robust to different speeches whilst confirming all the separated speeches with similar perceptual quality.

Highlights

Source separation is a major research area in both signal processing and social internet of things.The information obtained by sound source separation can be widely used for speech enhancement, sound scene reconstruction, and spatial audio production [1,2,3,4,5]
There exist a certain number of speakers in the horizontal plane of the microphone with different angles relative to the center of the B-format microphone [22], i.e., point O
In order to eliminate the problem of poor separation quality caused by this phenomenon, we propose a multiple source separation method based on self-reduction of dimensionality by using a B-format microphone

Summary

Introduction

Source separation is a major research area in both signal processing and social internet of things. The information obtained by sound source separation can be widely used for speech enhancement, sound scene reconstruction, and spatial audio production [1,2,3,4,5]. Source separation appears as the central problem of speech recognition and speaker identification problems as well [6,7,8,9]. There are several categories of source separation techniques

Methods

Results

Conclusion