Deep audio embeddings for vocalisation clustering.

Paul Best,Ricard Marxer,Hervé Glotin,Sébastien Paris

doi:10.1371/journal.pone.0283396

Abstract

The study of non-human animals' communication systems generally relies on the transcription of vocal sequences using a finite set of discrete units. This set is referred to as a vocal repertoire, which is specific to a species or a sub-group of a species. When conducted by human experts, the formal description of vocal repertoires can be laborious and/or biased. This motivates computerised assistance for this procedure, for which machine learning algorithms represent a good opportunity. Unsupervised clustering algorithms are suited for grouping close points together, provided a relevant representation. This paper therefore studies a new method for encoding vocalisations, allowing for automatic clustering to alleviate vocal repertoire characterisation. Borrowing from deep representation learning, we use a convolutional auto-encoder network to learn an abstract representation of vocalisations. We report on the quality of the learnt representation, as well as of state of the art methods, by quantifying their agreement with expert labelled vocalisation types from 8 datasets of other studies across 6 species (birds and marine mammals). With this benchmark, we demonstrate that using auto-encoders improves the relevance of vocalisation representation which serves repertoire characterisation using a very limited number of settings. We also publish a Python package for the bioacoustic community to train their own vocalisation auto-encoders or use a pretrained encoder to browse vocal repertoires and ease unit wise annotation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Jul 10, 2023
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Deep audio embeddings for vocalisation clustering.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Processing of Graded Signaling Systems
Philip Wadewitz
-
Philip WadewitzPhilip Wadewitz
21 Feb 2022
21 Feb 2022

Galaxy mergers in Subaru HSC-SSP: A deep representation learning approach for identification, and the role of environment on merger incidence
Kiyoaki Christopher Omori ... John D Silverman
Astronomy & Astrophysics | VOL. 679
Kiyoaki Christopher Omori, et. al.Kiyoaki Christopher Omori ... John D Silverman
01 Nov 2023
Astronomy & Astrophysics | VOL. 679

Deep Representation Learning for Orca Call Type Classification
Christian Bergler ... Manuel Schmitt
-
Christian Bergler, et. al.Christian Bergler ... Manuel Schmitt
01 Jan 2019
01 Jan 2019

Time-frequency characterization and classification, temporal-spatial, spectral, and source level distributions of fin whale vocalizations in the Norwegian Sea observed with a coherent hydrophone array
Heriberto Garcia
-
Heriberto GarciaHeriberto Garcia
10 May 2021
10 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep audio embeddings for vocalisation clustering.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE