Abstract

Ecoacoustics is an emerging science that seeks to understand the role of sound in ecological processes. Passive acoustic monitoring is being used to collect vast quantities of soundscape audio recordings to study variations in acoustic community and monitor biodiversity. However, extracting relevant information from soundscape recordings is non-trivial. Recent approaches to machine-learned acoustic features appear promising but are limited by at least three issues: inductive biases, lack of interpretability and crude temporal integration. In this paper we introduce a novel self-supervised representation learning algorithm for ecoacoustics - a convolutional Variational Auto-encoder (VAE) - and directly address these shortcomings. Firstly, we train the network on soundscape recordings from temperate and tropical field sites along a gradient of ecological degradation to provide a more relevant inductive bias than prior approaches. Secondly, we present a new method that allows interpretation of the latent space for the first time, giving insight into the basis of classification. Thirdly, we advance existing methods for temporal aggregation of learned embeddings by encoding latent features as a distribution over time. Under our approach to increase interpretability, we provide insight into how learned features drive habitat classification for the first time: inspection of latent space confirms that varying combinations of biophony, geophony and anthrophony are used to infer sites along a degradation gradient. Our novel temporal encoding method increases sensitivity to periodic signals and improves on previous research that uses time-averaged representations for site classification. This approach also reveals the contribution of hardware-specific frequency response that create a potential bias; we demonstrate how a simple linear transformation can be used to mitigate the effect of hardware variance on the learned representation under our approach. Our novel approach paves the way for development of a new class of deep neural networks that afford more interpretable learned ecoacoustic representations to advance both fundamental and applied science and support global conservation efforts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call