Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology

S P Kyathanahally,P Brun,F Pomati,T Bulas,M Baity-Jesi,M Reyes,E Merz,T Hardeman

doi:10.1038/s41598-022-21910-0

Abstract

Monitoring biodiversity is paramount to manage and protect natural resources. Collecting images of organisms over large temporal or spatial scales is a promising practice to monitor the biodiversity of natural ecosystems, providing large amounts of data with minimal interference with the environment. Deep learning models are currently used to automate classification of organisms into taxonomic units. However, imprecision in these classifiers introduces a measurement noise that is difficult to control and can significantly hinder the analysis and interpretation of data. We overcome this limitation through ensembles of Data-efficient image Transformers (DeiTs), which we show can reach state-of-the-art (SOTA) performances without hyperparameter tuning, if one follows a simple fixed training schedule. We validate our results on ten ecological imaging datasets of diverse origin, ranging from plankton to birds. The performances of our EDeiTs are always comparable with the previous SOTA, even beating it in four out of ten cases. We argue that these ensemble of DeiTs perform better not because of superior single-model performances but rather due to smaller overlaps in the predictions by independent models and lower top-1 probabilities, which increases the benefit of ensembling.

Full Text