Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

Veronica Morfi,Dan Stowell

doi:10.3390/app8081397

Abstract

In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.

Highlights

Machine learning has experienced a strong growth in recent years, due to increased dataset sizes and computational power, and to advances in deep learning methods that can learn to make predictions in extremely nonlinear problem settings [1]
We propose a factorisation of the final full transcription task into multiple simpler intermediate tasks of audio event detection and audio tagging in order to predict an intermediate transcription that can be used to boost the performance of the full transcription task
Many low-resource datasets are usually used for discriminating subclasses of a general class e.g., song of different bird species, sound of different car engines, barking of different dog breeds, and notes produced by an instrument. These subclasses usually share some common features and characteristics, in order to achieve a good performance in the audio event detection task, we propose considering all subclasses as one general class and train a single WHEN network to perform single class event detection

Summary

Introduction

Machine learning has experienced a strong growth in recent years, due to increased dataset sizes and computational power, and to advances in deep learning methods that can learn to make predictions in extremely nonlinear problem settings [1]. With the increased amount of audio datasets publicly available, there is an increase of tagging labels available for them. We refer to these tagging labels, which only indicate the presence or not of a type of event in a recording and lack any temporal information about it, as weak labels. In [3], the authors proposed to use a shrinking deep neural network incorporating unsupervised feature learning to handle the multi-label audio tagging. In [4,5], the authors use a stacked convolutional recurrent network to perform environmental audio tagging and tag the presence of birdsong, respectively. In [6], the authors explore two different models for end-to-end music audio tagging when there is a large amount of training data

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Aug 18, 2018
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Deep distributed convolutional neural networks: Universality
Ding-Xuan Zhou
Analysis and Applications | VOL. 16
Ding-Xuan ZhouDing-Xuan Zhou
01 Nov 2018
Analysis and Applications | VOL. 16

Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
-
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34
--
12 Apr 2022
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34

Small sample face recognition based on ensemble deep learning
Yuping Feng ... Tengfei Pang
-
Yuping Feng, et. al.Yuping Feng ... Tengfei Pang
01 Aug 2020
01 Aug 2020

GenSyth: a new way to understand deep learning
Alexander Wong ... Brendan Chwyl
Electronics Letters | VOL. 55
Alexander Wong, et. al.Alexander Wong ... Brendan Chwyl
01 Sep 2019
Electronics Letters | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences