Abstract

Birds play a main role in ecological monitoring, serving as indicators of environmental health. Many biological monitoring projects rely on the acoustic detection of birds. Despite the major increase in large datasets, this detection is often manual or semi-automatic, requiring manual tuning/postprocessing. The consistently used datasets for bird call monitoring and identification are [1] Cornell Bird Challenge(CBC) - 2020 dataset and [2] Xeno-Canto dataset. The major models used are [3] YamNet model which is a pre- trained model provided by the TensorFlow team. YamNet takes in a waveform of the given sound data sample and predicts the probability of each class, [1] ResNet-50, a deep CNN architecture for automated bird call recognition where spectrograms (visual features) extracted from the bird calls using Deep-CNN were used as input for ResNet-50 and Dmitry Konovalov et. al. in [2] suggested two approaches. The first approach was a stand-alone model trying the whole audio clip as input for ImageNet, ResNet, and VGGnet models. The second approach was a hybrid model that used window slides of raw audio, taking the spectrogram of each slide as input for CNN for representation and RNN for temporal correlation. The hybrid model achieved more accuracy. Even though many researchers have tried to automate bird call recognition, the desired accuracy of prediction is not achieved. On average the accuracy fluctuates between 60% - 72%. Additionally, prediction mainly depends on the quality of the dataset, the quantity of the dataset, the training pattern, and the input type given to the model. Key Words: Audio Classification, Birdcall monitoring, Bird Call identification, YamNet, ResNet-50, Deep-CNN, RNN, Spectrogram, Temporal Correlation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call