Abstract

Talker-independent monaural speaker separation aims to separate concurrent speakers from a single-microphone recording. Inspired by human auditory scene analysis (ASA) mechanisms, a two-stage deep CASA approach has been proposed recently to address this problem, which achieves state-of-the-art results in separating mixtures of two or three speakers. A main limitation of deep CASA is that it is a non-causal system, while many speech processing applications, e.g., telecommunication and hearing prosthesis, require causal processing. In this study, we propose a causal version of deep CASA to address this limitation. First, we modify temporal connections, normalization and clustering algorithms in deep CASA so that no future information is used throughout the deep network. We then train a C-speaker (C ≥ 2) deep CASA system in a speaker-number-independent fashion, generalizable to speech mixtures with up to C speakers without the prior knowledge about the speaker number. Experimental results show that causal deep CASA achieves excellent speaker separation performance with known or unknown speaker numbers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.