Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

Yui Sudo,Katsutoshi Itoyama,Kenji Nishida,Kazuhiro Nakadai

doi:10.1109/ieeeconf49454.2021.9382730

Abstract

This paper proposes a multi-channel environmental sound segmentation method. Environmental sound segmentation is an integrated method that deals with sound source localization, sound source separation and class identification. When multiple microphones are available, spatial features can be used to improve the separation accuracy of signals from different directions; however, conventional methods have two drawbacks: (a) Since sound source localization and sound source separation using spatial features and class identification using spectral features are trained in the same neural network, it overfits to the relationship between the direction of arrival and the class. (b) Although the permutation invariant training used in speech recognition could be extended, it is not practical for environmental sounds due to the maximum number of speakers limitation. This paper proposes multi-channel environmental sound segmentation method that combines U-Net which simultaneously performs sound source localization and sound source separation, and convolutional neural network which classifies the separated sounds. This method prevents overfitting to the relationship between the direction of arrival and the class. Simulation experiments using the created datasets including 75-class environmental sounds showed that the root mean squared error of the proposed method was lower than that of the conventional method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multichannel environmental sound segmentation
Yui Sudo ... Kenji Nishida
Applied Intelligence | VOL. 51
Yui Sudo, et. al.Yui Sudo ... Kenji Nishida
30 Mar 2021
Applied Intelligence | VOL. 51

Sound Source Localization and Separation
Kazuhiro Nakadai ... Keisuke Nakamura
-
Kazuhiro Nakadai, et. al.Kazuhiro Nakadai ... Keisuke Nakamura
15 Jun 2015
15 Jun 2015

Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization
Elie Laurent Benaroya ... Nicolas Obin
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26
Elie Laurent Benaroya, et. al.Elie Laurent Benaroya ... Nicolas Obin
01 Jun 2018
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26

A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition
Keisuke Nakamura ... Hiroshi G Okuno
Advanced Robotics | VOL. 27
Keisuke Nakamura, et. al.Keisuke Nakamura ... Hiroshi G Okuno
01 Aug 2013
Advanced Robotics | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net

Abstract

Talk to us

Similar Papers