Embracing imperfection: Machine-assisted invertebrate classification in real-world datasets

Jarrett Blair,Michael D Weiser,Kirsten De Beurs,Michael Kaspari,Cameron Siler,Katie E Marshall

doi:10.1016/j.ecoinf.2022.101896

Jarrett Blair, Michael D Weiser + Show 4 more

Open Access

https://doi.org/10.1016/j.ecoinf.2022.101896

Copy DOI

Abstract

Despite growing concerns over the health of global invertebrate diversity, terrestrial invertebrate monitoring efforts remain poorly geographically distributed. Machine-assisted classification has been proposed as a potential solution to quickly gather large amounts of data; however, previous studies have often used unrealistic or idealized datasets to train and test their models.In this study, we describe a practical methodology for including machine learning in ecological data acquisition pipelines. Here we train and test machine learning algorithms to classify over 72,000 terrestrial invertebrate specimens from morphometric data and contextual metadata. All vouchered specimens were collected in pitfall traps by the National Ecological Observatory Network (NEON) at 45 locations across the United States from 2016 to 2019. Specimens were photographed, and two separate machine learning paradigms were used to classify them. In the first, we used a convolutional neural network (ResNet-50), and in the second, we extracted morphometric data as feature vectors using ImageJ and used traditional machine learning methods to classify specimens. Issues stemming from inconsistent taxonomic label specificity were resolved by making classifications at the lowest identified taxonomic level (LITL). Taxa with too few specimens to be included in the training dataset were classified by the model using zero-shot classification.When classifying specimens that were known and seen by our models, we reached a maximum accuracy of 72.7% using eXtreme Gradient Boosting (XGBoost) at the LITL. This nearly matched the maximum accuracy achieved by the CNN of 72.8% at the LITL. Models that were trained without contextual metadata underperformed models with contextual metadata. We also classified invertebrate taxa that were unknown to the model using zero-shot classification, reaching a maximum accuracy of 65.5% when using the ResNet-50, compared to 39.4% when using XGBoost.The general methodology outlined here represents a realistic application of machine learning as a tool for ecological studies. We found that more advanced and complex machine learning methods such as convolutional neural networks are not necessarily more accurate than traditional machine learning methods. Hierarchical and LITL classifications allow for flexible taxonomic specificity at the input and output layers. These methods also help address the ‘long tail’ problem of underrepresented taxa missed by machine learning models. Finally, we encourage researchers to consider more than just morphometric data when training their models, as we have shown that the inclusion of contextual metadata can provide significant improvements to accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Ecological Informatics	Publication Date: Nov 5, 2022
Citations: 4	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Embracing imperfection: Machine-assisted invertebrate classification in real-world datasets

Abstract

Talk to us

Similar Papers

More From: Ecological Informatics

Lead the way for us

Similar Papers

P125. Development of a novel ensemble machine learning algorithm for prediction of complications and readmission after anterior cervical spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

P126. Development of a novel ensemble machine learning algorithm for prediction of complications and readmission after posterior cervical spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

An intercomparison of weather normalization of PM2.5 concentration using traditional statistical methods, machine learning, and chemistry transport models
Huang Zheng ... Roy M Harrison
npj Climate and Atmospheric Science | VOL. 6
Huang Zheng, et. al.Huang Zheng ... Roy M Harrison
20 Dec 2023
npj Climate and Atmospheric Science | VOL. 6

Machine Learning Applications in Orthopaedic Imaging.
Vincent M Wang ... Albert J Kozar
The Journal of the American Academy of Orthopaedic Surgeons | VOL. 28
Vincent M Wang, et. al.Vincent M Wang ... Albert J Kozar
15 May 2020
The Journal of the American Academy of Orthopaedic Surgeons | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Embracing imperfection: Machine-assisted invertebrate classification in real-world datasets

Abstract

Talk to us

Similar Papers

More From: Ecological Informatics