Abstract

AbstractThe hike in the demand for smart cities has gathered the interest of researchers to work on environmental sound classification. Most researchers' goal is to reach the Bayesian optimal error in the field of audio classification. Nonetheless, it is very baffling to interpret meaning from a three‐dimensional audio and this is where different types of spectrograms become effective. Using benchmark spectral features such as mel frequency cepstral coefficients (MFCCs), chromagram, log‐mel spectrogram (LM), and so on audio can be converted into meaningful 2D spectrograms. In this paper, we propose a convolutional neural network (CNN) model, which is fabricated with additive angular margin loss (AAML), large margin cosine loss (LMCL) and a‐softmax loss. These loss functions proposed for face recognition, hold their value in the other fields of study if they are implemented in a systematic manner. The mentioned loss functions are more dominant than conventional softmax loss when it comes to classification task because of its capability to increase intra‐class compactness and inter‐class discrepancy. Thus, with MCAAM‐Net, MCAS‐Net and MCLCM‐Net models, a classification accuracy of 99.60%, 99.43% and 99.37% is achieved on UrbanSound8K dataset respectively without any augmentation. This paper also demonstrates the benefit of stacking features together and the above‐mentioned validation accuracies are achieved after stacking MFCCs and chromagram on the x‐axis. We also visualized the clusters formed by the embedded vectors of test data for further acknowledgement of our results, after passing it through different proposed models. Finally, we show that the MCAAM‐Net model achieved an accuracy of 99.60% on UrbanSound8K dataset, which outperforms the benchmark models like TSCNN‐DS, ADCNN‐5, ESResNet‐Attention, and so on that are introduced over the recent years.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.