DCNN-LSTM Based Audio Classification Combining Multiple Feature Engineering and Data Augmentation Techniques

Md. Moinul Islam,Md. Zesun Ahmed Mia,Monjurul Haque,Saiful Islam,S. M. A. Mohaiminur Rahman

doi:10.1007/978-3-030-93247-3_23

Abstract

AbstractEverything we know is based on our brain’s ability to process sensory data. Hearing is a crucial sense for our ability to learn. Sound is essential for a wide range of activities such as exchanging information, interacting with others, and so on. To convert the sound electrically, the role of the audio signal comes into play. Because of the countless essential applications, audio signal & their classification poses an important value. However, in this day and age, classifying audio signals remains a difficult task. To classify audio signals more accurately and effectively, we have proposed a new model. In this study, we’ve applied a brand-new method for audio classification that combines the strengths of Deep Convolutional Neural Network (DCNN) and Long-Short Term Memory (LSTM) models with a unique combination of feature engineering to get the best possible outcome. Here, we have integrated data augmentation and feature extraction together before fitting it into the model to evaluate the performance. There is a higher degree of accuracy observed after the experiment. To validate the efficacy of our model, a comparative analysis has been made with the latest conducted reference works.KeywordsDCNN-LSTMSpectrogramsShort Time Fourier TransformData augmentationSpectral feature extractionMFCCMelspectrogramChroma STFTTonnetz

Full Text