Abstract

The analysis of ambient sounds can be very useful when developing sound base intelligent systems. Acoustic scene classification (ASC) is defined as identifying the area of a recorded sound or clip among some predefined scenes. ASC has huge potential to be used in urban sound event classification systems. This research presents a hybrid method that includes a novel mathematical fusion step which aims to tackle the challenges of ASC accuracy and adaptability of current state-of-the-art models. The proposed method uses a stereo signal, two ensemble classifiers (random subspace), and a novel mathematical fusion step. In the proposed method, a stable, invariant signal representation of the stereo signal is built using Wavelet Scattering Transform (WST). For each mono, i.e., left and right, channel, a different random subspace classifier is trained using WST. A novel mathematical formula for fusion step was developed, its parameters being found using a Genetic algorithm. The results on the DCASE 2017 dataset showed that the proposed method has higher classification accuracy (about 95%), pushing the boundaries of existing methods.

Highlights

  • The analysis of ambient sounds can be very useful when developing sound base intelligent systems

  • Acoustic scene classification (ASC) is a subset of algorithms and systems for audio understanding by machine learning audio based algorithms, i.e., computer audition (CA)

  • ASC systems have many challenges, one of them is the different type of inputs, since the quality of the microphones or audio sensors varies, the number of recorded audios from a scene varies, and the sensors can be mono or stereo [3]

Read more

Summary

Introduction

The analysis of ambient sounds can be very useful when developing sound base intelligent systems. In the last few years, sound based intelligent systems have received a lot of attention in indoor and outdoor scenarios. ASC is a subset of algorithms and systems for audio understanding by machine learning audio based algorithms, i.e., computer audition (CA). Computer audition systems attempt to suggest intelligent algorithms to extract meaningful information from audio data [2]. ASC is a preprocessing step in some of these systems that attempt to identify the scene of audio data, e.g., airport, park and subway, just to name a few. ASC systems have many challenges, one of them is the different type of inputs, since the quality of the microphones or audio sensors varies, the number of recorded audios from a scene varies, and the sensors can be mono (single channel) or stereo (dual channel) [3]

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call