Speaker-Independent Speech Separation With Deep Attractor Network

Yi Luo,Zhuo Chen,Nima Mesgarani

doi:10.1109/taslp.2018.2795749

Yi Luo, Zhuo Chen + Show 1 more

Open Access

https://doi.org/10.1109/taslp.2018.2795749

Copy DOI

Abstract

Despite the recent success of deep learning for many speech processing tasks, single-microphone, speaker-independent speech separation remains challenging for two main reasons. The first reason is the arbitrary order of the target and masker speakers in the mixture permutation problem, and the second is the unknown number of speakers in the mixture output dimension problem. We propose a novel deep learning framework for speech separation that addresses both of these issues. We use a neural network to project the time-frequency representation of the mixture signal into a high-dimensional embedding space. A reference point attractor is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space. The time-frequency embeddings of each speaker are then forced to cluster around the corresponding attractor point which is used to determine the time-frequency assignment of the speaker. We propose three methods for finding the attractors for each source in the embedding space and compare their advantages and limitations. The objective function for the network is standard signal reconstruction error which enables end-to-end operation during both training and test phases. We evaluated our system using the Wall Street Journal dataset WSJ0 on two and three speaker mixtures and report comparable or better performance than other state-of-the-art deep learning methods for speech separation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Apr 1, 2018
Citations: 256	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Speaker-Independent Speech Separation With Deep Attractor Network

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION.
Zhuo Chen ... Nima Mesgarani
Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) | VOL. 2017
Zhuo Chen, et. al.Zhuo Chen ... Nima Mesgarani
01 Mar 2017
Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference) | VOL. 2017

Developing a Chatbot system using Deep Learning based for Universities consultancy
Thuong Le-Tien ... Vy Huynh-Y
-
Thuong Le-Tien, et. al.Thuong Le-Tien ... Vy Huynh-Y
03 Jan 2022
03 Jan 2022

Seek Common While Shelving Differences: Orchestrating Deep Neural Networks for Edge Service Provisioning
Lixing Chen ... Jie Xu
IEEE Journal on Selected Areas in Communications | VOL. 39
Lixing Chen, et. al.Lixing Chen ... Jie Xu
16 Dec 2020
IEEE Journal on Selected Areas in Communications | VOL. 39

Relevance feedback for Content-based Image Retrieval using deep learning
Heng Xu ... Jun-Yi Wang
-
Heng Xu, et. al. Heng Xu ... Jun-Yi Wang
01 Jun 2017
01 Jun 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker-Independent Speech Separation With Deep Attractor Network

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing