Abstract

The application of machine learning methods has led to major advances in the development of automated recognisers used to analyse bioacoustics data. To further improve the performance of automated call recognisers, we investigated the development of efficient data annotation strategies and how best to address uncertainty around ambiguous vocalisations. These challenges present a particular problem for species whose vocalisations are rare in field recordings, where collecting enough training data can be problematic and a species' vocalisations may be poorly documented.We provide an open access solution to address these challenges using two strategies. First, we applied an active learning framework to iteratively improve a convolutional neural network (CNN) model able to automate call identification for a target rare bird species, the southern black-throated finch (Poephila cincta cincta). We collected 9098 h of unlabelled audio recordings from a field study in the Desert Uplands Bioregion of Queensland, Australia, and used active learning to prioritise human annotation effort towards data that would best improve model fit. Second, we progressed methods for managing ambiguous vocalisations by applying machine learning methods more commonly used in medical image analysis and natural language processing. Specifically, we assessed agreement among human annotators and the CNN model (i.e. inter-annotator agreement) and used this to determine realistic performance outcomes for the CNN model and to identify areas where inter-annotator agreement may be improved. We also applied a classification approach that allowed the CNN model to classify sounds into an ‘uncertain’ category, which replicated a requirement of human-annotation and facilitated the comparison of human-model annotation performance.We found that active learning was an efficient strategy to build a CNN model where there was limited labelled training data available, and target calls were extremely rare in the unlabelled data. As few as five active learning iterations, generating a final labelled dataset of 1073 target calls and 5786 non-target sounds, were required to train a model to identify the target species with comparable performance to experts in the field.Assessment of inter-annotator agreement identified a bias in our model to align predictions most closely with those of the primary annotator and identified significant differences in inter-annotator agreement among subsets of our acoustic data. Our results highlight the use of inter-annotator agreement to understand model performance and identify areas for improvement in data annotation. We also show that excluding ambiguous vocalisations during data annotation results in an overestimation of model performance, an important consideration for datasets with inter-annotator disagreement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call