Multichannel environmental sound segmentation

Yui Sudo,Katsutoshi Itoyama,Kazuhiro Nakadai,Kenji Nishida

doi:10.1007/s10489-021-02314-5

Yui Sudo, Katsutoshi Itoyama + Show 2 more

Open Access

https://doi.org/10.1007/s10489-021-02314-5

Copy DOI

Journal: Applied Intelligence	Publication Date: Mar 30, 2021
Citations: 7	License type: open-access

Affiliation: Tokyo Institute of Technology, Honda (Japan)

Abstract

This paper proposes a multichannel environmental sound segmentation method. Environmental sound segmentation is an integrated method to achieve sound source localization, sound source separation and classification, simultaneously. When multiple microphones are available, spatial features can be used to improve the localization and separation accuracy of sounds from different directions; however, conventional methods have three drawbacks: (a) Sound source localization and sound source separation methods using spatial features and classification using spectral features trained in the same neural network, may overfit to the relationship between the direction of arrival and the class of a sound, thereby reducing their reliability to deal with novel events. (b) Although permutation invariant training used in autonomous speech recognition could be extended, it is impractical for environmental sounds that include an unlimited number of sound sources. (c) Various features, such as complex values of short time Fourier transform and interchannel phase differences have been used as spatial features, but no study has compared them. This paper proposes a multichannel environmental sound segmentation method comprising two discrete blocks, a sound source localization and separation block and a sound source separation and classification block. By separating the blocks, overfitting to the relationship between the direction of arrival and the class is avoided. Simulation experiments using created datasets including 75-class environmental sounds showed the root mean squared error of the proposed method was lower than that of conventional methods.

Highlights

Various methods such as sound source localization (SSL), sound source separation (SSS), and classification have been proposed in acoustic signal processing, robot audition, and machine learning for use in real-world environments containing multiple overlapping sound events [1,2,3].Conventional approaches use the cascade method, incorporating individual functions based on array signal processing techniques [4,5,6]
The sound source localization and separation (SSLS) block does not completely separate sounds arriving from a close direction, and the errors caused by the SSLS block accumulate
The proposed structure, in which the classification block of the SSLS + Classification structure was replaced by the separation and classification (SSSC) block, clearly had a smaller RMSE; the SSLS + Classification structure did not have the ability to correct the errors that occurred in the SSLS, but the SSSC block with inclusion of the separation feature reduced the propagation of errors in the SSLS block

Summary

Introduction

Various methods such as sound source localization (SSL), sound source separation (SSS), and classification have been proposed in acoustic signal processing, robot audition, and machine learning for use in real-world environments containing multiple overlapping sound events [1,2,3]. In addition to the magnitude spectra, using interchannel phase difference (IPD) between microphones as spatial features have been reported to improve ASR performance in overlapping sounds containing multiple speakers. Deep learning-based methods for sound event localization and detection (SELD) have been proposed [18,19,20,21] These methods simultaneously perform SSL and sound event detection (SED) of environmental sounds. – Comparison of various spatial features revealed the sine and cosine of IPDs to be optimum for sound source localization and separation

Multichannel autonomous speech recognition

Multichannel environmental sound segmentation

Sound event localization and detection methods for environmental sound

Issues of related works

Proposed method

Feature extraction

Sound source localization and separation

Sound source separation and classification

Evaluation

Analysis of the overfitting to the relationship between the DOA and the class

Comparison between various model structures

Results and discussion

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multichannel environmental sound segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Intelligence

Lead the way for us

Similar Papers

Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net
Yui Sudo ... Katsutoshi Itoyama
-
Yui Sudo, et. al.Yui Sudo ... Katsutoshi Itoyama
11 Jan 2021
11 Jan 2021

Sound Source Localization and Separation
Kazuhiro Nakadai ... Keisuke Nakamura
-
Kazuhiro Nakadai, et. al.Kazuhiro Nakadai ... Keisuke Nakamura
15 Jun 2015
15 Jun 2015

音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識
Shunichi Yamamoto ... Tetsuya Ogata
Journal of the Robotics Society of Japan | VOL. 25
Shunichi Yamamoto, et. al.Shunichi Yamamoto ... Tetsuya Ogata
01 Jan 2007
Journal of the Robotics Society of Japan | VOL. 25

Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization
Elie Laurent Benaroya ... Nicolas Obin
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26
Elie Laurent Benaroya, et. al.Elie Laurent Benaroya ... Nicolas Obin
01 Jun 2018
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multichannel environmental sound segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Intelligence