Learning to localize sounds in a highly reverberant environment: Machine-learning tracking of dolphin whistle-like sounds in a pool.

Sean F Woodward,Marcelo O Magnasco,Diana Reiss,Haru Matsumoto

doi:10.1371/journal.pone.0235155

Sean F Woodward, Marcelo O Magnasco + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0235155

Copy DOI

Abstract

Tracking the origin of propagating wave signals in an environment with complex reflective surfaces is, in its full generality, a nearly intractable problem which has engendered multiple domain-specific literatures. We posit that, if the environment and sensor geometries are fixed, machine learning algorithms can "learn" the acoustical geometry of the environment and accurately track signal origin. In this paper, we propose the first machine-learning-based approach to identifying the source locations of semi-stationary, tonal, dolphin-whistle-like sounds in a highly reverberant space, specifically a half-cylindrical dolphin pool. Our algorithm works by supplying a learning network with an overabundance of location "clues", which are then selected under supervised training for their ability to discriminate source location in this particular environment. More specifically, we deliver estimated time-difference-of-arrivals (TDOA's) and normalized cross-correlation values computed from pairs of hydrophone signals to a random forest model for high-feature-volume classification and feature selection, and subsequently deliver the selected features into linear discriminant analysis, linear and quadratic Support Vector Machine (SVM), and Gaussian process models. Based on data from 14 sound source locations and 16 hydrophones, our classification models yielded perfect accuracy at predicting novel sound source locations. Our regression models yielded better accuracy than the established Steered-Response Power (SRP) method when all training data were used, and comparable accuracy along the pool surface when deprived of training data at testing sites; our methods additionally boast improved computation time and the potential for superior localization accuracy in all dimensions with more training data. Because of the generality of our method we argue it may be useful in a much wider variety of contexts.

Highlights

While we found that these TDOA estimations were not reliable enough to accurately perform tonal source localization using Spherical Interpolation, a possibility mentioned in the previous paragraph, we suspected that they still might be useful as part of a larger machine learning feature set
While the regression models’ overall performance on test snippets representing novel source locations was not satisfactory, Machine learning localization of dolphin whistle-like sounds admitting error larger than average dolphin length (MAD of 3.37 m, Interquartile Ranges (IQR) = 2.85—3.75), when we decomposed the error along three Cartesian axes (X-axis Median Absolute Deviation (MAD) of 0.56 m with IQR = 0.26—1.03, Y-axis MAD of 0.50 m with IQR = 0.17—1.62, and Z-axis MAD of 2.73 m with IQR = 2.14—3.39), we found that the Euclidean localization error was dominated by localization error along the Z-Axis, or direction of pool depth (Fig 7)
The results suggest that Gaussian process regression can perform just as well as Steered-Response Power (SRP) for localizing tonals across the pool surface, which is often sufficient for distinguishing among potential sound sources based on overhead imaging

Summary

Introduction

Many researchers expect to find significant communicative capacity in dolphins given their complex social structure [1,2,3], advanced cognition including the capacity for mirror self-recognition [4], culturally transmitted tool-use and other behaviors [5], varied and adaptive foraging strategies [6], and their capacity for metacognition [7]. A particular narrowband class of call, termed the whistle, has been identified as socially important. For the common bottlenose dolphin, Tursiops truncatus—arguably the focal species of most dolphin cognitive and communication research—research has focused on signature whistles, individually distinctive whistles [14,15,16] that may convey an individual’s identity to conspecifics [15, 17] and that can be mimicked, potentially to gain conspecifics’ attention [18]

Objectives

Methods

Results

Discussion

Conclusion