Abstract

Passive acoustic monitoring is one important technique for detecting long-term ecological change. Since audio data contains complex temporal and frequency information, deep learning is an ideal computational framework for automatic processing. In this work, the dataset consists of 64k annotated clips distributed across 23 classes that are derived from 17 000 h of acoustic data consisting of bio-, geo- and anthropophonies recorded at two different sites in the Capital Region of New York State. A hierarchical deep convolutional neural network with one local classifier per parent node provides fine, medium, and coarse-grained label predictions. If a fine-grained label is predicted incorrectly, a correct medium or coarse label still provides valuable taxonomic information. While the flat classifier performs similarly to the hierarchical for single-corpus training scenarios, it achieves higher medium-grained predictions for categories such as avian vocalization. For cross-corpus scenarios, the hierarchical classifier achieves a superior performance overall. These results highlight the importance of neural network architecture customization to suit a particular task and domain; hierarchical structures show potential in open set recognition tasks, where the entire set of possible classes is unknown. [Work supported by NSF Grant No. 1631674 and an RPI HASS Fellowship.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call