Performance metrics for marine mammal signal detection and classification.

John A Hildebrand,Kaitlin E Frasier,Tyler A Helble,Marie A Roch

doi:10.1121/10.0009270

Abstract

Automatic algorithms for the detection and classification of sound are essential to the analysis of acoustic datasets with long duration. Metrics are needed to assess the performance characteristics of these algorithms. Four metrics for performance evaluation are discussed here: receiver-operating-characteristic (ROC) curves, detection-error-trade-off (DET) curves, precision-recall (PR) curves, and cost curves. These metrics were applied to the generalized power law detector for blue whale D calls [Helble, Ierley, D'Spain, Roch, and Hildebrand (2012). J. Acoust. Soc. Am. 131(4), 2682-2699] and the click-clustering neural-net algorithm for Cuvier's beaked whale echolocation click detection [Frasier, Roch, Soldevilla, Wiggins, Garrison, and Hildebrand (2017). PLoS Comp. Biol. 13(12), e1005823] using data prepared for the 2015 Detection, Classification, Localization and Density Estimation Workshop. Detection class imbalance, particularly the situation of rare occurrence, is common for long-term passive acoustic monitoring datasets and is a factor in the performance of ROC and DET curves with regard to the impact of false positive detections. PR curves overcome this shortcoming when calculated for individual detections and do not rely on the reporting of true negatives. Cost curves provide additional insight on the effective operating range for the detector based on the a priori probability of occurrence. Use of more than a single metric is helpful in understanding the performance of a detection algorithm.

Highlights

Automatic algorithms for the detection and classification of sound are essential to the analysis of acoustic datasets with long duration
Case studies for the application of these performance metrics were drawn from the Seventh International Workshop on Detection, Classification, Localization, and Density Estimation of Marine Mammals using Passive Acoustics (DCLDE, 2015)
The high-frequency dataset consists of marked time periods for encounters with echolocation clicks of species commonly found along the U.S West Coast, including Cuvier’s (Ziphius cavirostris) and Baird’s beaked (Mesoplodon densirostris) whales (BaumannPickering et al, 2013), Risso’s (Grampus griseus) and Pacific white-sided (Lagenorhynchus obliquidens) dolphins (Soldevilla et al, 2008), sperm whales (Physeter macrocephalus), unidentified porpoises (Phocoenidae spp.) and unidentified odontocetes, as well as click-level markings for only Cuvier’s beaked whales that supplemented the DCLDE (2015) dataset

Summary

Introduction

Automatic algorithms for the detection and classification of sound are essential to the analysis of acoustic datasets with long duration. Four metrics for performance evaluation are discussed here: receiver-operating-characteristic (ROC) curves, detection-errortrade-off (DET) curves, precision-recall (PR) curves, and cost curves. These metrics were applied to the generalized power law detector for blue whale D calls [Helble, Ierley, D’Spain, Roch, and Hildebrand (2012). We evaluate the metrics used for assessment and comparison of detection and classification algorithms, their strengths and weaknesses when applied to detection of underwater marine mammal sounds, and areas for their future development. The quality of an algorithm for automatic detection and classification of these sounds is typically evaluated by analyzing how well they perform on a labeled dataset. Discriminant function analysis provides only nominal predictions (Oswald et al, 2003), whereas support vector machines (SVM) provide a numerical value for the prediction score (Tachibana et al, 2014)

Objectives

Methods

Results

Discussion

Conclusion