Gestalt Principles Emerge When Learning Universal Sound Source Separation

Han Li,Kean Chen,Bernhard U Seeber

doi:10.1109/taslp.2022.3178233

Abstract

Sound source separation is an essential aspect in auditory scene analysis, which is still an urgent challenge for machine hearing. In this paper, a fully convolutional time-domain audio separation network (ConvTasNet) is trained for universal two-source separation, consisting of speech, environmental sounds, and music. Besides the separation performance of the network, the underlying separation mechanisms are our main concern. Through a series of classic auditory segregation experiments, we systematically explore the principles learned by the network for simultaneous and sequential organization. The results show that without prior knowledge of auditory scene analysis imparted on the network, it spontaneously learns the separation mechanisms from raw waveforms that are similar to those which have developed over many years in humans. The Gestalt principles for separation in the human auditory system are shown to be effective in our network: harmonicity, onset synchrony and common fate (coherent modulation in amplitude and frequency), proximity, continuity, similarity. The universal sound source separation network following Gestalt principles is not limited to specific sources and can be applied to various acoustic situations like human hearing, providing new directions for solving the problem of auditory scene analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Gestalt Principles Emerge When Learning Universal Sound Source Separation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2022
License type: other-oa

Similar Papers

Demonstrations of Auditory Scene Analysis: The Perceptual Organization of Sound
Israel Nelken
Trends in Neurosciences | VOL. 20
Israel NelkenIsrael Nelken
13 May 1997
Trends in Neurosciences | VOL. 20

Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net
Yui Sudo ... Kazuhiro Nakadai
-
Yui Sudo, et. al.Yui Sudo ... Kazuhiro Nakadai
11 Jan 2021
11 Jan 2021

Multichannel environmental sound segmentation
Yui Sudo ... Kenji Nishida
Applied Intelligence | VOL. 51
Yui Sudo, et. al.Yui Sudo ... Kenji Nishida
30 Mar 2021
Applied Intelligence | VOL. 51

Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition.
Takuya Koumura ... Hiroki Terashima
The Journal of Neuroscience | VOL. 39
Takuya Koumura, et. al.Takuya Koumura ... Hiroki Terashima
15 May 2019
The Journal of Neuroscience | VOL. 39

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gestalt Principles Emerge When Learning Universal Sound Source Separation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing