Abstract

Single channel and blind speech separation (SCBSS) and enhancement are widely used in many real-time applications. The performance of the separation is crucial for these applications. However, such situations have complex features such as unknown numbers, complex forms of dialogue, severe noise pollution, and difficult to obtain information in advance. There are many challenges in single-channel and multi-person mixed speech separation based on unsupervised machine implementation. In this paper, a kind of unsupervised speech separation algorithm based on the combination of Convolutional Non-Negative Matrix Factorization (CNMF) and Joint Approximative Diagonalization of Eigenmatrix (JADE) (CNMF+JADE) is performed on the detected overlapped speech including the target speaker. The experimental results show that the method presented can effectively extract the target speaker in the speech mixtures generated by convolving TIMIT speech sources. Compared with the traditional supervised learning method, the proposed method can only use a small amount of training data to obtain the target speaker's speech without supervision in single-channel speech, which is more universal and robust. It provides a reference solution for speech extraction of target speakers in complex multi-speaker scenario.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call