Handcrafted features and late fusion with deep learning for bird sound classification

Jie Xie,Mingying Zhu

doi:10.1016/j.ecoinf.2019.05.007

Abstract

Automated classification of calling bird species is useful for large-scale temporal and spatial environmental monitoring. In this paper, we investigate acoustic features, visual features, and deep learning for bird sound classification. For the deep learning approach, the Convolutional Neural Network layers are used for learning generalized features and dimension reduction, while a conventional fully connected layer is used for classification. Then, an unified end-to-end model is built by combing those three layers for classifying calling bird species. For visual and acoustic features, two traditional classifiers are compared to classify the bird sounds. Experimental results on 14 bird species indicate that our proposed deep learning method can achieve the best F1-score 94.36%, which is higher than using the acoustic features approach (88.97%) and using the visual features approach (88.87%). To further improve the classification performance, a class-based late fusion method is explored. Our final best classification F1-score is 95.95%, which is obtained by the late fusion of the acoustic features approach, the visual features approach, and deep learning.

Full Text