Sound event recognition in unstructured environments using spectrogram image processing

Jonathan William Dennis

doi:10.32657/10356/59272

Abstract

The objective of this research is to develop feature extraction and classification techniques for the task of sound event recognition (SER) in unstructured environments. Although this field is traditionally overshadowed by the popular field of automatic speech recognition (ASR), an SER system that can achieve human-like sound recognition performance opens up a range of novel application areas. These include acoustic surveillance, bio-acoustical monitoring, environmental context detection, healthcare applications and more generally the rich transcription of acoustic environments. The challenge in such environments are the adverse effects such as noise, distortion and multiple sources, which are more likely to occur with distant microphones compared to the close-talking microphones that are more common in ASR. In addition, the characteristics of acoustic events are less well defined than those of speech, and there is no sub-word dictionary available like the phonemes in speech. Therefore, the performance of ASR systems typically degrades dramatically in these challenging unstructured environments, and it is important to develop new methods that can perform well for this challenging task. In this thesis, the approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions. This enables novel methods for SER to be developed based on spectrogram image processing, which are inspired by techniques from the field of image processing. The motivation for such an approach is based on finding an automatic approach to “spectrogram reading”, where it is possible for humans to visually recognise the different sound event signatures in the spectrogram. The advantages of such an approach are twofold. Firstly, the sound event image representation makes it possible to naturally capture the sound information in a two-dimensional feature. This has advantages over conventional onedimensional frame-based features, which capture only a slice of spectral information

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sound event recognition in unstructured environments using spectrogram image processing

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Continuous robust sound event classification using time-frequency features and deep learning
Ian Mcloughlin ... Haomin Zhang
PLOS ONE | VOL. 12
Ian Mcloughlin, et. al.Ian Mcloughlin ... Haomin Zhang
11 Sep 2017
PLOS ONE | VOL. 12

Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition
S Chandrakala ... S L Jayalakshmi
IEEE Transactions on Multimedia | VOL. 22
S Chandrakala, et. al.S Chandrakala ... S L Jayalakshmi
23 Jul 2019
IEEE Transactions on Multimedia | VOL. 22

Recognition and retrieval of sound events using sparse coding convolutional neural network
Chien-Yao Wang ... Andri Santoso
-
Chien-Yao Wang, et. al.Chien-Yao Wang ... Andri Santoso
01 Jul 2017
01 Jul 2017

Multi-view representation for sound event recognition
S Chandrakala ... Jayalakshmi S L
Signal, Image and Video Processing | VOL. 15
S Chandrakala, et. al.S Chandrakala ... Jayalakshmi S L
23 Jan 2021
Signal, Image and Video Processing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sound event recognition in unstructured environments using spectrogram image processing

Abstract

Talk to us

Similar Papers