Abstract
The collection and annotation of bioacoustic data presents a number of challenges to researchers, often constraining analysis to highly vocal species. Computational tools allow monitoring to be extended to less vocal and more challenging species but data limitations remain an issue. We present a human-in-the-loop approach that combines the efficiency of computational tools with the accuracy of human analysis. We use a wavelet-based segmentation method that automatically extracts transient features within field recordings which can reduce data by up to 90% and requires as few as one reference feature. Segmented features are then used to fine-tune a transformer-based model, audio spectrogram transformer (AST), the output of which is verified by a human and the adjusted data fed back into the model to improve performance over time. We also present an outlier detection approach based on Mel-frequency Cepstral Coefficients. Coefficients are projected to 2-D and outliers are detected using silhouette score. This approach was able to achieve 98.8% validation accuracy on a binary classification task using a limited dataset of 200 5-min recordings with sparse features (occurrence rates of less than 1%). This approach makes real-time bioacoustic monitoring of less-vocal species a possibility.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have