Abstract

Long-term soundscape recordings are useful for a variety of applications, most notably in bioacoustics. However, the processing of such data is currently limited by the ability to efficiently and reliably detect the target sounds, which are often sparse and overshadowed by environmental noise. This paper proposes a sound detector based on changepoint theory applied to a wavelet representation of the sound. In contrast to existing methods, in this framework, theoretical analysis of the detector's performance and optimality for downstream applications can be made. The relevant statistical and algorithmic developments to support these claims are presented. The method is then tested on a real task of detecting two bird species in acoustic surveys. Compared to commonly used alternatives, the proposed method consistently produced a lower false alarm rate and improved the survey efficiency as measured by the precision of the inferred population size. Finally, it is demonstrated how the method can be combined with a simple classifier to detect cat sounds in domestic recordings, which is an example from the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 workshop. The resulting performance is comparable to the state-of-the-art deep learning models and requires much less training data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call