Greedy regression and differential convex-based deep learning for audio event classification

J Sangeetha,M Priyanka,C Jayakumar

doi:10.3233/jifs-232561

Abstract

Audio Event Detection (AED) and classification of acoustic events has become a notable task for machines to interpret the auditory information around us. Nevertheless, it has been a difficult and cumbersome task to extract the most basic characteristics of acoustic events that encapsulate the fundamental elements of the audio events. Previous works on audio event classification utilized supervised pre-training as well as meta-learning approaches that happened to depend on labeled data therefore facing instability. Deep Learning is progressing in an increasingly mature direction, and the application of deep learning methods to detect acoustic event has become more and more sought after. The proposed hybrid method called Greedy Regression-based Convolutional Neural Network and Differential Convex Bidirectional Gated Recurrent Unit (GRCNN-DCBGRU) is introduced to learn a vector representation of an audio sequence for Audio Event Classification (AEC). Differential Convex Bidirectional Gated Recurrent Unit is analogous to long short-term memory and involves time-cyclic long-term dependencies with a lesser processing complexity. The model first extracts acoustic features from the sound event dataset through a Differential Convex Bidirectional Gated Recurrent Unit employing Gabor Filter bank features and then extracts the local static acoustic features through the Greedy Regression-based Convolutional Neural Network by utilizing Mel Frequency Cepstral Coefficients (MFCC). Finally, the Differential Convex Meta-Learning classifier is used for the final acoustic event classification. Extensive evaluation on large-size publicly available acoustic event database like Findsounds2016 will be performed in Python programming language to demonstrate the efficiency of the proposed method for the AEC task. To demonstrate the visualizations of individual modules and their influence on overall representation learning for AEC tasks, several parameters like audio detection time, audio detection accuracy, precision, and recall are measured.

Full Text