For the conservation of avian biodiversity, bird detection is vital since it allows ornithologists to quantify which species exist in a particular area. Analyzing their acoustic signals enables the efficient identification of multiple bird species from overlapping recordings. This paper addresses classifying bird vocalizations in real-time audio recording using acoustic analysis. Schemes based on recurrent neural networks (RNN) are presented in the proposed work. Gated-recurrent units (GRU) are a particular type of RNN that has shown remarkable performance in acoustic classification. We propose a hierarchical Attention-based bidirectional gated recurrent unit (BiGRU) model for classifying acoustic signals of birds by using Mel-frequency cepstral coefficients (MFCC). The attention mechanism has proved its superior efficacy in many acoustic, speech and music processing applications. The attention mechanism is employed to give a different focus to the information outputted from the hidden layers of BiGRU. We adopted a short-time sliding-aggregation approach to decide on the test data, in which probability outcomes are species-wise summed and normalized. Species with the highest probability scores are assumed to be the dominant species in the recording. Our Attention-BiGRU classifier achieves pretty high performance in the Xeno-Canto dataset, with an F1-score of 0.84, with competing performance to the state-of-the-art multi-label classifiers.
Read full abstract