Designing better frog call recognition models

Paul S Crump,Jeff Houlahan

doi:10.1002/ece3.2730

Abstract

Advances in bioacoustic technology, such as the use of automatic recording devices, allow wildlife monitoring at large spatial scales. However, such technology can produce enormous amounts of audio data that must be processed and analyzed. One potential solution to this problem is the use of automated sound recognition tools, but we lack a general framework for developing and validating these tools. Recognizers are computer models of an animal sound assembled from “training data” (i.e., actual samples of vocalizations). The settings of variables used to create recognizers can impact performance, and the use of different settings can result in large differences in error rates that can be exploited for different monitoring objectives. We used Song Scope (Wildlife Acoustics Inc.) to build recognizers and vocalizations of the wood frog (Lithobates sylvaticus) to test how different settings and amounts of training data influence recognizer performance. Performance was evaluated using precision (the probability of a recognizer match being a true match) and sensitivity (the proportion of vocalizations detected) based on a receiver operating characteristic (ROC) curve‐determined score threshold. Evaluations were conducted using recordings not used to build the recognizer. Wood frog recognizer performance was sensitive to setting changes in four out of nine variables, and small improvements were achieved by using additional training data from different sites and from the same recording, but not from different recordings from the same site. Overall, the effect of changes to variable settings was much greater than the effect of increasing training data. Additionally, by testing the performance of the recognizer on vocalizations not used to build the recognizer, we discovered that Type I error rates appear idiosyncratic and do not recommend extrapolation from training to new data, whereas Type II errors showed more consistency and extrapolation can be justified. Optimizing variable settings on independent recordings led to a better match between recognizer performance and monitoring objectives. We provide general recommendations for application of this methodology with other species and make some suggestions for improvements.

Full Text