Detection and identification of European woodpeckers with deep convolutional neural networks

Juliette Florentin,Thierry Dutoit,Olivier Verlinden

doi:10.1016/j.ecoinf.2019.101023

Abstract

Every spring, European forest soundscapes fill up with the drums and calls of woodpeckers as they draw territories and pair up. Each drum or call is species-specific and easily picked up by a trained ear. In this study, we worked toward automating this process and thus toward making the continuous acoustic monitoring of woodpeckers practical. We recorded from March to May successively in Belgium, Luxemburg and France, collecting hundreds of gigabytes of data. We shed 50–80% of these recordings using the Acoustic Complexity Index (ACI). Then, for both the detection of the target signals in the audio stream and the identification of the different species, we implemented transfer learning from computer vision to audio analysis. This meant transforming sounds into images via spectrograms and retraining legacy deep image networks that have been made public (e.g. Inception) to work with such data. The visual patterns produced by drums (vertical lines) and call syllables (hats, straight lines, waves, etc.) in spectrograms are characteristic and allow an identification of the signals. We retrained using data from Xeno-Canto, Tierstimmen and a private collection. In the subsequent analysis of the field recordings, the repurposed networks gave outstanding results for the detection of drums (either 0.2–9.9% of false positives, or for the toughest dataset, a reduction from 28,601 images to 1000 images left for manual review) and for the detection and identification of calls (73.5–100.0% accuracy; in the toughest case, dataset reduction from 643,901 images to 14,667 images). However, they performed less well for the identification of drums than a simpler method using handcrafted features and the k-Nearest Neighbor (k-NN) classifier. The species character in drums does not lie in shapes but in temporal patterns: speed, acceleration, number of strikes and duration of the drums. These features are secondary information in spectrograms, and the image networks that have learned invariance toward object size tend to disregard them. At locations where they drummed abundantly, the accuracy was 83.0% for Picus canus (93.1% for k-NN) and 36.1% for Dryocopus martius (81.5% for k-NN). For the three field locations we produced time lines of the encountered woodpecker activity (6 species, 11 signals).

Full Text