Abstract

Environmental sound classification (ESC) is gaining popularity in the field of information processing due to its significance in non-speech audio categorization. ESC faces challenges in categorizing ambient sounds, which lack a clear structure, unlike speech or music sounds. Deep learning (DL) techniques are widely used in ESC to extract relevant information from ambient sounds, in which feature extraction is very crucial and directly impacts the classification performance. However, feature extraction can be computationally expensive, especially when employing complex non-linear techniques, due to the significant amount of computing resources required during the training process of DL models. Moreover, environmental sounds can exhibit significant variability in terms of their temporal and spectral characteristics, which can pose challenges to training DL models effectively. In order to overcome these limitations, this research proposes an efficient method for extracting meaningful features from audio files and improving DL techniques' performance using spectrogram images generated from sound environmental datasets. The proposed approach uses convolutional neural networks (CNNs) with attention mechanisms and appropriate data augmentation methods. Unlike pre-trained models that use a single vector for feature extraction, the proposed approach uses a new concatenation-based CNN model with attention mechanisms, which can more effectively capture intricate relationships between input data features. This approach allows for the extraction of features from different sectors of the feature space, resulting in a more precise classification of complex and diverse datasets. Additionally, the proposed approach leverages the parallel extraction feature technique to extract features from multiple CNN models, which improves classification performance. Furthermore, attention modules are used to focus on the most relevant features of the input data. Comparative experiments are also conducted for the proposed approach and some existing state-of-the-art methods, and the results show that the former has better classification performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.