Abstract

In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs.

Highlights

  • Sound classification and recognition have been applied in different domains, e.g., speech recognition [1], music classification [2], environmental sound recognition, and biometric identification [3]

  • We use the dissimilarity space to generate a vector space representation of each pattern, which is fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector

  • Results show that the fusion of convolutional neural network (CNN)-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs

Read more

Summary

Introduction

Sound classification and recognition have been applied in different domains, e.g., speech recognition [1], music classification [2], environmental sound recognition, and biometric identification [3]. In pattern recognition problems, features have been extracted from the actual audio traces (e.g., Statistical Spectrum Descriptor and Rhythm Histogram [4]). By replacing audio traces by their visual representation, image classification techniques can be used to extract features on sound classification problems. The most commonly used visual representation of audio traces involves the display of their frequency spectrum as they vary in time, as in spectrograms [5]. Mel-frequency Cepstral Coefficients spectrograms [6]. Costa et al [8,9]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call