Abstract

Whale vocal calls contain valuable information and abundant characteristics that are important for classification of whale sub-populations and related biological research. In this study, an effective data-driven approach based on pre-trained Convolutional Neural Networks (CNN) using multi-scale waveforms and time-frequency feature representations is developed in order to perform the classification of whale calls from a large open-source dataset recorded by sensors carried by whales. Specifically, the classification is carried out through a transfer learning approach by using pre-trained state-of-the-art CNN models in the field of computer vision. 1D raw waveforms and 2D log-mel features of the whale-call data are respectively used as the input of CNN models. For raw waveform input, windows are applied to capture multiple sketches of a whale-call clip at different time scales and stack the features from different sketches for classification. When using the log-mel features, the delta and delta-delta features are also calculated to produce a 3-channel feature representation for analysis. In the training, a 4-fold cross-validation technique is employed to reduce the overfitting effect, while the Mix-up technique is also applied to implement data augmentation in order to further improve the system performance. The results show that the proposed method can improve the accuracies by more than 20% in percentage for the classification into 16 whale pods compared with the baseline method using groups of 2D shape descriptors of spectrograms and the Fisher discriminant scores on the same dataset. Moreover, it is shown that classifications based on log-mel features have higher accuracies than those based directly on raw waveforms. The phylogeny graph is also produced to significantly illustrate the relationships among the whale sub-populations.

Highlights

  • Acoustic methods are an established technique to monitor marine mammal populations and their behaviors

  • Classification of killer whale and pilot whale calls is of great importance

  • This means that each time, 5960 samples are selected from the development dataset to train the Convolutional Neural Networks (CNN) models, while 1497 samples are used for testing

Read more

Summary

Introduction

Acoustic methods are an established technique to monitor marine mammal populations and their behaviors. Most methods are feature-based classifiers, which generally first extract or search for deterministic features of audio data in the time or frequency domain and apply classification algorithms. This work has already made progress in the unsupervised classification and similarity analysis of large acoustic datasets of whale calls, it still highly relies on the effectiveness of different polynomial decomposition techniques and the Fisher scores algorithm. The recent successful applications of CNN-based models to time series classification have motivated studies aiming for better input representations of audio signals in order to train the CNN networks more efficiently. This calls studyfor aims to apply CNN efficiently extract the informative features from large datasets of of whale classification and to similarity analysis.

Methodology
Classification on Raw
Classification on Log-Mel Features
Pre-Trained CNN Models
Mix-Up Data Augmentation
Similarity Analysis and the Phylogeny
Data Preparation
Model Training
Classification of Whale-Call Data
Similarity and Phylogenic Analysis
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call