DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION.

Zhuo Chen,Yi Luo,Nima Mesgarani

doi:10.1109/icassp.2017.7952155

Abstract

Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source permutation and unknown number of sources in the mixture. We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source. Attractor points in this study are created by finding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. The proposed model is different from prior works in that it implements an end-to-end training, and it does not depend on the number of sources in the mixture. Two strategies are explored in the test time, K-means and fixed attractor points, where the latter requires no post-processing and can be implemented in real-time. We evaluated our system on Wall Street Journal dataset and show 5.49% improvement over the previous state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)

Lead the way for us

Journal: Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)	Publication Date: Mar 1, 2017
Citations: 428

Similar Papers

Speaker-Independent Speech Separation With Deep Attractor Network
Yi Luo ... Nima Mesgarani
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26
Yi Luo, et. al.Yi Luo ... Nima Mesgarani
01 Apr 2018
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26

Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang ... Dan Su
-
Jun Wang, et. al.Jun Wang ... Dan Su
02 Sep 2018
02 Sep 2018

X-DC: Explainable Deep Clustering Based on Learnable Spectrogram Templates
Chihiro Watanabe ... Hirokazu Kameoka
-
Chihiro Watanabe, et. al.Chihiro Watanabe ... Hirokazu Kameoka
11 Jun 2021
11 Jun 2021

Deep imitation learning for 3D navigation tasks
Ahmed Hussein ... Eyad Elyan
Neural Computing and Applications | VOL. 29
Ahmed Hussein, et. al.Ahmed Hussein ... Eyad Elyan
04 Dec 2017
Neural Computing and Applications | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)