Abstract
Abstract Automated systems driven by machine learning is becoming increasingly used as an environmental monitoring tool. A common approach is to use classification algorithms to identify counts of categories (e.g. species) from images. However, the classification algorithms can be biased in the presence of classification error. To draw valid conclusions, it is crucial to incorporate these errors into the analysis and interpretation of the classification algorithm results. We introduce a general framework for describing counts with classification errors from classifiers, including data from both the classifier and a confusion matrix. The framework incorporates uncertainty in the classification matrix as well as uncertainty in the generating process. By treating the classification errors as latent variables, our framework allows a wide range of generating processes. We illustrate our methods with three case studies based on simulated data from different generating processes, and data from a machine learning algorithm to identify zooplankton in the Celtic Seas and English Channel. The framework is widely applicable in many subject areas where classification errors occur.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have