Abstract

Delphinidae are known to produce a rich variety of vocalizations whose function and form are not yet completely understood. The ability to reliably identify the vocal repertoire of delphinidae species will permit further exploration of the role acoustic signals play in delphinidae social structure and behavior. This work applies contemporary techniques from human speech recognition and computer vision to automatically classify the vocal repertoire of wild spotted dolphins (Stenella attenuata), using acoustic data collected from hydrophones mounted directly on free-swimming animals. The performance of a variety of machine learning algorithms, including SVMs, random forests, dynamic time warping, and convolutional neural networks, is quantified using data features from both the fields of cetacean acoustics and human speech recognition. We demonstrate that maximal performance is achieved using acoustic spectrogram features in conjunction with a convolutional neural network (CNN), a machine learning technique known to produce state of the art results on image classification tasks. Even on a relatively small dataset (~350 labeled vocalizations), the CNN achieves 86% classification accuracy on eleven unique vocalization types, with groundtruth labels provided by domain scientists. This work demonstrates that a complex, parametric model, such as a CNN, can be effectively applied to small-data domains in ecology to achieve state-of-the-art performance in acoustic classification problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call