Validation methods for plankton image classification systems

Pablo González,Eva Álvarez,Ángel López‐Urrutia,Juan José Del Coz,Jorge Díez

doi:10.1002/lom3.10151

Pablo González, Eva Álvarez + Show 3 more

Open Access

https://doi.org/10.1002/lom3.10151

Copy DOI

Abstract

AbstractIn recent decades, the automatic study and analysis of plankton communities using imaging techniques has advanced significantly. The effectiveness of these automated systems appears to have improved, reaching acceptable levels of accuracy. However, plankton ecologists often find that classification systems do not work as well as expected when applied to new samples. This paper proposes a methodology to assess the efficacy of learned models which takes into account the fact that the data distribution (the plankton composition of the sample) can vary between the model building phase and the production phase. As opposed to most validation methods that consider the individual organism as the unit of validation, our approach uses a validation‐by‐sample, which is more appropriate when the objective is to estimate the abundance of different morphological groups. We argue that, in these cases, the base unit to correctly estimate the error is the sample, not the individual. Thus, model assessment processes require groups of samples with sufficient variability in order to provide precise error estimates.

Highlights

In recent decades, the automatic study and analysis of plankton communities using imaging techniques has advanced significantly
Our goal is to propose an assessment methodology that ensures that training and testing datasets change, introducing the data distribution variations that will occur under real conditions
Artefacts Ciliates Crustaceans Detritus Diatoms methods, focusing on the differences between those based on the performance at an individual level and those based on samples

Summary

Introduction

The automatic study and analysis of plankton communities using imaging techniques has advanced significantly. The effectiveness of these automated systems appears to have improved, reaching acceptable levels of accuracy. As opposed to most validation methods that consider the individual organism as the unit of validation, our approach uses a validation-by-sample, which is more appropriate when the objective is to estimate the abundance of different morphological groups. In these cases, the base unit to correctly estimate the error is the sample, not the individual. Model assessment processes require groups of samples with sufficient variability in order to provide precise error estimates

Objectives

Methods

Results

Conclusion