Abstract

The paper address the task of identifying multiple bird species from audio recordings. The proposed approach uses one of the pre-trained Deep Convolutional Neural Network (DCNN), VGG-16 model to learn the bird’s vocalization through a sliding window analysis on melspectrogram. We adopted an aggregation strategy to decide on the test file in which sigmoid outputs are aggregated and normalized. The candidates with maximum probability scores are assumed to be birds present in the audio recording. The proposed method is evaluated on the Xeno-canto bird sound database. We used bird calls from 10 different species. Mel-spectrograms (visual features) generated from the bird calls were used as input for VGG-16. The performance is also compared with the MFCC-DNN approach. The proposed visualization-based system reports an average F1-score of 0.65 and it outperforms the acoustic cue-based MFCC-DNN approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.