Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture

Francisco J Bravo Sanchez,Steven T Moore,Nathan B English,Md Rahat Hossain

doi:10.1038/s41598-021-95076-6

Francisco J Bravo Sanchez, Steven T Moore + Show 2 more

Open Access

https://doi.org/10.1038/s41598-021-95076-6

Copy DOI

Journal: Scientific Reports	Publication Date: Aug 3, 2021
Citations: 28	License type: open-access

Affiliation: Central Queensland University

Abstract

The use of autonomous recordings of animal sounds to detect species is a popular conservation tool, constantly improving in fidelity as audio hardware and software evolves. Current classification algorithms utilise sound features extracted from the recording rather than the sound itself, with varying degrees of success. Neural networks that learn directly from the raw sound waveforms have been implemented in human speech recognition but the requirements of detailed labelled data have limited their use in bioacoustics. Here we test SincNet, an efficient neural network architecture that learns from the raw waveform using sinc-based filters. Results using an off-the-shelf implementation of SincNet on a publicly available bird sound dataset (NIPS4Bplus) show that the neural network rapidly converged reaching accuracies of over 65% with limited data. Their performance is comparable with traditional methods after hyperparameter tuning but they are more efficient. Learning directly from the raw waveform allows the algorithm to select automatically those elements of the sound that are best suited for the task, bypassing the onerous task of selecting feature extraction techniques and reducing possible biases. We use publicly released code and datasets to encourage others to replicate our results and to apply SincNet to their own datasets; and we review possible enhancements in the hope that algorithms that learn from the raw waveform will become useful bioacoustic tools.

Highlights

The use of autonomous recordings of animal sounds to detect species is a popular conservation tool, constantly improving in fidelity as audio hardware and software evolves
More recently the focus has shifted toward the use of deep learning methods such as Convolutional Neural Networks (CNN)[7]
Calculations of receiver operating characteristic (ROC) area under the curve (AUC) averaged 75.6% over 30 trained models, while the accuracy over the same models averaged 60%

Summary

Introduction

The use of autonomous recordings of animal sounds to detect species is a popular conservation tool, constantly improving in fidelity as audio hardware and software evolves. Neural networks that learn directly from the raw sound waveforms have been implemented in human speech recognition but the requirements of detailed labelled data have limited their use in bioacoustics. Advances in digital sound recording hardware and storage have led to the widespread use of autonomous recording units These are digital sound recorders that, with small servicing requirements, are deployed in the field for weeks to months (or indefinitely) and acquire large amounts of acoustic data. Processing bioacoustic data comes with inherent challenges[1,8]; overlapping target sounds, environmental and background noises, power variance of sound due to varying distances between source and recorder, and the variability of the sound even within the same species Another challenge is the lack of adequately labelled datasets to train software. More recently the focus has shifted toward the use of deep learning methods such as Convolutional Neural Networks (CNN)[7]

Methods

Results

Conclusion