Abstract

Passive acoustic monitoring – an approach that utilizes autonomous acoustic recording units – allows for non-invasive monitoring of individuals, assuming that it is possible to acoustically distinguish individuals. However, identifying effective analytical approaches for individual identification remains a challenge. Our study investigates how the use of different feature representations impacts our ability to distinguish between individual female Northern grey gibbons (Hylobates funereus). We broadcast pre-recorded calls from twelve gibbon females and re-recorded the calls at varying distances (directly under the tree to ∼400 m away) using autonomous recording units. We evaluated the effectiveness of using different automated feature extraction approaches to classify gibbon calls: Mel-frequency cepstral coefficients (MFCCs), embeddings from three pre-trained neural networks (BirdNET, VGGish, and Wav2Vec2), and four commonly used acoustic indices. We used a supervised classification approach (random forest) to classify calls to the respective female and compared two unsupervised clustering approaches (affinity propagation clustering and hierarchical density-based spatial clustering) to evaluate which features were most effective for distinguishing female calls without using class labels. We used MFCCs as a baseline as previous work has shown they can be used to distinguish high-quality calls of individual gibbon females. Human annotators could only identify calls in spectrograms from recordings <350 m from the playback speaker with signal-to-noise ratio ∼ 0 dB, so our results focus on these recordings. Using supervised classification, our results confirmed the efficiency of MFCCs and the use of embeddings from one neural network (BirdNET) for effective acoustic classification of gibbon individuals at closer recording distances (signal-to-noise ratio > 10 dB), while the remaining features did not perform well. Contrary to our expectations, we found that MFCCs outperformed all other features for the unsupervised clustering tasks at closer distances and none of the features performed well at farther distances. The ability to acoustically discriminate animals under noisy conditions and from low signal-to-noise ratio calls has important implications for monitoring populations of endangered animals, such as gibbons. Focusing only on high signal-to-noise ratio calls for individual discrimination may not be possible for rare sounds, and future work should focus on developing effective approaches of feature extraction that can perform well across noisy, real-world conditions with a limited number of training samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call