Abstract

Feature extraction for Acoustic Bird Species Classification (ABSC) tasks has traditionally been based on parametric representations that were specifically developed for speech signals, such as Mel Frequency Cepstral Coefficients (MFCC). However, the discrimination capabilities of these features for ABSC could be enhanced by accounting for the vocal production mechanisms of birds, and, in particular, the spectro-temporal structure of bird sounds. In this paper, a new front-end for ABSC is proposed that incorporates this specific information through the non-negative decomposition of bird sound spectrograms. It consists of the following two different stages: short-time feature extraction and temporal feature integration. In the first stage, which aims at providing a better spectral representation of bird sounds on a frame-by-frame basis, two methods are evaluated. In the first method, cepstral-like features (NMF_CC) are extracted by using a filter bank that is automatically learned by means of the application of Non-Negative Matrix Factorization (NMF) on bird audio spectrograms. In the second method, the features are directly derived from the activation coefficients of the spectrogram decomposition as performed through NMF (H_CC). The second stage summarizes the most relevant information contained in the short-time features by computing several statistical measures over long segments. The experiments show that the use of NMF_CC and H_CC in conjunction with temporal integration significantly improves the performance of a Support Vector Machine (SVM)-based ABSC system with respect to conventional MFCC.

Highlights

  • Regarding the comparison between Mel Frequency Cepstral Coefficients (MFCC) and NMF_CC-based front-ends, NMF_CC and NMF_CC + Δ achieve a relative error reduction with respect to MFCC of approximately 6.25% and 19.63%, respectively, and this latter result is statistically significant. These results suggest that the filters automatically learned by the Negative Matrix Factorization (NMF) algorithm are better suited to model the bird vocal production than the mel-scaled filter bank, which is a better fit for the human production and auditory system

  • We analyze the primary factors that explain the performance of the MFCC and NMF-based front-ends for the Acoustic Bird Species Classification (ABSC) task, which are the capability of each parameterization for achieving an adequate representation of bird sounds, the acoustic similarity between sounds belonging to different species and the presence of noise

  • The primary focus of this paper is on the short-time feature extraction module, in which two new parameterization schemes based on the non-negative decomposition of bird sound spectrograms through the application of the NMF algorithm are proposed

Read more

Summary

Methods

Non-Negative Matrix Factorization (NMF)we provide a brief description about the mathematical foundations of NonNegative Matrix Factorization because it provides the background of the two short-time feature extraction schemes proposed in this paper. Given a matrix V 2 RFþÂT, where each column is a data vector, NMF approximates it as a product of two non-negative low-rank matrices W and H, such that. V % WH; ð1Þ where W 2 RFþÂK and H 2 RKþÂT and normally K min (F, T) In this way, each column of V can be written as a linear combination of the K basis vectors (columns of W), which are weighted with the coefficients of activation or gains located in the corresponding column of H. NMF can be seen as a dimensionality reduction in data vectors from an F—dimensional space to a K—dimensional space. This finding is possible if the columns of W uncover the latent structure in the data [27].

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.