Mel-frequency cepstral coefficients outperform embeddings from pre-trained convolutional neural networks under noisy conditions for discrimination tasks of individual gibbons

Mohamed Walid Lakdari,Abdul Hamid Ahmad,Sarab Sethi,Gabriel A Bohn,Dena J Clink

doi:10.1016/j.ecoinf.2023.102457

Abstract

Passive acoustic monitoring – an approach that utilizes autonomous acoustic recording units – allows for non-invasive monitoring of individuals, assuming that it is possible to acoustically distinguish individuals. However, identifying effective analytical approaches for individual identification remains a challenge. Our study investigates how the use of different feature representations impacts our ability to distinguish between individual female Northern grey gibbons (Hylobates funereus). We broadcast pre-recorded calls from twelve gibbon females and re-recorded the calls at varying distances (directly under the tree to ∼400 m away) using autonomous recording units. We evaluated the effectiveness of using different automated feature extraction approaches to classify gibbon calls: Mel-frequency cepstral coefficients (MFCCs), embeddings from three pre-trained neural networks (BirdNET, VGGish, and Wav2Vec2), and four commonly used acoustic indices. We used a supervised classification approach (random forest) to classify calls to the respective female and compared two unsupervised clustering approaches (affinity propagation clustering and hierarchical density-based spatial clustering) to evaluate which features were most effective for distinguishing female calls without using class labels. We used MFCCs as a baseline as previous work has shown they can be used to distinguish high-quality calls of individual gibbon females. Human annotators could only identify calls in spectrograms from recordings <350 m from the playback speaker with signal-to-noise ratio ∼ 0 dB, so our results focus on these recordings. Using supervised classification, our results confirmed the efficiency of MFCCs and the use of embeddings from one neural network (BirdNET) for effective acoustic classification of gibbon individuals at closer recording distances (signal-to-noise ratio > 10 dB), while the remaining features did not perform well. Contrary to our expectations, we found that MFCCs outperformed all other features for the unsupervised clustering tasks at closer distances and none of the features performed well at farther distances. The ability to acoustically discriminate animals under noisy conditions and from low signal-to-noise ratio calls has important implications for monitoring populations of endangered animals, such as gibbons. Focusing only on high signal-to-noise ratio calls for individual discrimination may not be possible for rare sounds, and future work should focus on developing effective approaches of feature extraction that can perform well across noisy, real-world conditions with a limited number of training samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Ecological Informatics	Publication Date: Jan 4, 2024
Citations: 3	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Mel-frequency cepstral coefficients outperform embeddings from pre-trained convolutional neural networks under noisy conditions for discrimination tasks of individual gibbons

Abstract

Talk to us

Similar Papers

More From: Ecological Informatics

Lead the way for us

Similar Papers

Comparative Study of Fine-Tuning of Pre-Trained Convolutional Neural Networks for Diabetic Retinopathy Screening
Saboora Mohammadian ... Ali Karsaz
-
Saboora Mohammadian, et. al.Saboora Mohammadian ... Ali Karsaz
01 Nov 2017
01 Nov 2017

First-person activity recognition with C3D features from optical flow images
Asamichi Takamine ... Yumi Iwashita
-
Asamichi Takamine, et. al.Asamichi Takamine ... Yumi Iwashita
01 Dec 2015
01 Dec 2015

Estimating bird density using passive acoustic monitoring: a review of methods and suggestions for further research
Cristian Pérez‐Granados ... Juan Traba
Ibis | VOL. 163
Cristian Pérez‐Granados, et. al.Cristian Pérez‐Granados ... Juan Traba
01 Mar 2021
Ibis | VOL. 163

Prediction of Pneumonia from Chest X-Ray Images Using Pre-trained Convolutional Neural Networks
Venkata Krishna Kishore Kolli ... Venkatramaphanikumar Sistla
-
Venkata Krishna Kishore Kolli, et. al.Venkata Krishna Kishore Kolli ... Venkatramaphanikumar Sistla
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mel-frequency cepstral coefficients outperform embeddings from pre-trained convolutional neural networks under noisy conditions for discrimination tasks of individual gibbons

Abstract

Talk to us

Similar Papers

More From: Ecological Informatics