A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Abigail Copiaco,Nidhal Abdulaziz,Christian Ritz,Stefano Fasciani

doi:10.3390/app11114880

Abstract

Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.

Highlights

The continuous research advances in the field of single and multi-channel audio classification suggests its importance and relevance in a broad range of real-world applications.In this work, we focus on domestic multi-channel audio classification, which can be applied to monitoring systems and assistive technology [1,2].The majority of the existing works within this area are based on the classification of sound events found in single channel audio [3,4] rather than classifying multi-channel audio signals containing acoustic scenes, which is required to understand the continuous nature of daily domestic activities
We propose the use of spectro-temporal features in the form of scalograms, which are computed through a fast Fourier transform (FFT)-based continuous wavelet transform (CWT) [10]
Per-level and average comparisons using mel-frequency cepstral coefficients (MFCC) and Log-Mel spectrogram features against the proposed CWTFT scalograms method are seen in Table 3, which is an average of three training trials

Summary

Introduction

The continuous research advances in the field of single and multi-channel audio classification suggests its importance and relevance in a broad range of real-world applications.In this work, we focus on domestic multi-channel audio classification, which can be applied to monitoring systems and assistive technology [1,2].The majority of the existing works within this area are based on the classification of sound events found in single channel audio [3,4] rather than classifying multi-channel audio signals containing acoustic scenes, which is required to understand the continuous nature of daily domestic activities. The detection of multi-channel audio was found to be 10% more accurate when compared to single channel audio, considering the case of overlapping sounds that commonly occur in real-life [6]. A similar concept to this work is the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 5 challenge, which focuses on domestic multi-channel acoustic scene classification [7]. In this challenge, top performing methods often involve the use of Log-Mel energies and Mel-frequency

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: May 26, 2021
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Automated Diagnosis of Breast Cancer from Mammogram Using Wavelet, Curvelet Features, and Convolutional Neural Network
R S Karthic ... K R Aravind Britto
Journal of Medical Imaging and Health Informatics | VOL. 12
R S Karthic, et. al.R S Karthic ... K R Aravind Britto
01 Jan 2021
Journal of Medical Imaging and Health Informatics | VOL. 12

Deep Learning Based Crack Detection in Inhomogeneous X-Ray Images for High Pressure Turbine Blades in Aviation
Timo Kuhlgatz ... Thomas Seel
-
Timo Kuhlgatz, et. al.Timo Kuhlgatz ... Thomas Seel
24 Jun 2024
24 Jun 2024

Transfer Learning and Fine-Tuning for Deep Learning-Based Tea Diseases Detection on Small Datasets
Ade Ramdan ... Hilman F Pardede
-
Ade Ramdan, et. al.Ade Ramdan ... Hilman F Pardede
18 Nov 2020
18 Nov 2020

Double-Shot Transfer Learning for Breast Cancer Classification from X-Ray Images
Mohammad Alkhaleefah ... Praveen Kumar Chittem
Applied Sciences | VOL. 10
Mohammad Alkhaleefah, et. al.Mohammad Alkhaleefah ... Praveen Kumar Chittem
09 Jun 2020
Applied Sciences | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences