Abstract

With the growth of deep learning in various classification problems, many researchers have used deep learning methods in environmental sound classification tasks. This paper introduces an end-to-end method for environmental sound classification based on a one-dimensional convolution neural network with Bayesian optimization and ensemble learning, which directly learns features representation from the audio signal. Several convolutional layers were used to capture the signal and learn various filters relevant to the classification problem. Our proposed method can deal with any audio signal length, as a sliding window divides the signal into overlapped frames. Bayesian optimization accomplished hyperparameter selection and model evaluation with cross-validation. Multiple models with different settings have been developed based on Bayesian optimization to ensure network convergence in both convex and non-convex optimization. An UrbanSound8K dataset was evaluated for the performance of the proposed end-to-end model. The experimental results achieved a classification accuracy of 94.46%, which is 5% higher than existing end-to-end approaches with fewer trainable parameters. Four measurement indices, namely: sensitivity, specificity, accuracy, precision, recall, F-measure, area under ROC curve, and the area under the precision-recall curve were used to measure the model performance. The proposed approach outperformed state-of-the-art end-to-end approaches that use hand-crafted features as input in selected measurement indices and time complexity.

Highlights

  • Environmental sound classification is a sound recognition system that identifies sound events in the real world, identified as sound event recognition

  • To overcome the mentioned limitations, in this paper, we propose a new one-dimensional end-to-end Convolution neural networks (CNNs) integrated with Bayesian optimization (BO) [51] and ensemble learning [23,52] methods that learn directly from the audio signal

  • The results show that the Bayesian optimization algo

Read more

Summary

Introduction

Environmental sound classification is a sound recognition system that identifies sound events in the real world, identified as sound event recognition. Sound problems have received noticeable attention, with popular applications from crime detection [1] to environmental context-aware processing [1], healthcare [2], recognition of automatic speech [3], music information [4], noise mitigation [5], music classification [6], and smart audio-based surveillance systems [7,8]. Most environmental classification approaches depend on hand-crafted features such as typical automatic classification systems or mid-level representations [6], which obtain a good trade-off between model accuracy and its computational cost [9,10,11,12,13,14,15,16,17,18,19,20,21,22]. The neural network has been the primary choice for environmental recognition, and has been superior to conventional classifiers in the last few years [13]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call