Methods, New Software Tools, and Best Practices for Developing High-quality Training Data for Machine Learning-based Image Analysis in Biodiversity Research

Brian Stucky,Laura Brenskelle,Robert Guralnick

doi:10.3897/biss.3.37615

Brian Stucky, Laura Brenskelle + Show 1 more

Open Access

https://doi.org/10.3897/biss.3.37615

Copy DOI

Abstract

Recent progress in using deep learning techniques to automate the analysis of complex image data is opening up exciting new avenues for research in biodiversity science. However, potential applications of machine learning methods in biodiversity research are often limited by the relative scarcity of data suitable for training machine learning models. Development of high-quality training data sets can be a surprisingly challenging task that can easily consume hundreds of person-hours of time. In this talk, we present the results of our recent work implementing and comparing several different methods for generating annotated, biodiversity-oriented image data for training machine learning models, including collaborative expert scoring, local volunteer image annotators with on-site training, and distributed, remote image annotation via citizen science platforms. We discuss error rates, among-annotator variance, and depth of coverage required to ensure highly reliable image annotations. We also discuss time considerations and efficiency of the various methods. Finally, we present new software, called ImageAnt (currently under development), that supports efficient, highly flexible image annotation workflows. ImageAnt was created primarily in response to the challenges we discovered in our own efforts to generate image-based training data for machine learning models. ImageAnt features a simple user interface and can be used to implement sophisticated, adaptive scripting of image annotation tasks.

Highlights

Recent progress in using deep learning techniques to automate the analysis of complex image data is opening up exciting new avenues for research in biodiversity science
Potential applications of machine learning methods in biodiversity research are often limited by the relative scarcity of data suitable for training machine learning models
We present the results of our recent work implementing and comparing several different methods for generating annotated, biodiversity-oriented image data for training machine learning models, including collaborative expert scoring, local volunteer image annotators with on-site training, and distributed, remote image annotation via citizen science platforms

Summary

Introduction

Recent progress in using deep learning techniques to automate the analysis of complex image data is opening up exciting new avenues for research in biodiversity science. New Software Tools, and Best Practices for Developing High-quality Training Data for Machine Learning-based Image Analysis in Biodiversity Research

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Biodiversity Information Science and Standards	Publication Date: Jul 2, 2019
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Methods, New Software Tools, and Best Practices for Developing High-quality Training Data for Machine Learning-based Image Analysis in Biodiversity Research

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards

Lead the way for us

Similar Papers

Image annotation and curation in radiology: an overview for machine learning practitioners
Fabio Galbusera ... Andrea Cina
European Radiology Experimental | VOL. 8
Fabio Galbusera, et. al.Fabio Galbusera ... Andrea Cina
06 Feb 2024
European Radiology Experimental | VOL. 8

Application of Machine Learning Methods for Asset Management on Power Distribution Networks
Gopal Lal Rajora ... Carlos Mateo Domingo
Emerging Science Journal | VOL. 6
Gopal Lal Rajora, et. al.Gopal Lal Rajora ... Carlos Mateo Domingo
31 May 2022
Emerging Science Journal | VOL. 6

Seismic fragility analysis of steel moment frames using machine learning models
Hoang D Nguyen ... Myoungsu Shin
Engineering Applications of Artificial Intelligence | VOL. 126
Hoang D Nguyen, et. al.Hoang D Nguyen ... Myoungsu Shin
15 Aug 2023
Engineering Applications of Artificial Intelligence | VOL. 126

Assisted Cement Log Interpretation
Eirik Time ... Siddharth Mishra
-
Eirik Time, et. al.Eirik Time ... Siddharth Mishra
27 Apr 2022
27 Apr 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Methods, New Software Tools, and Best Practices for Developing High-quality Training Data for Machine Learning-based Image Analysis in Biodiversity Research

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards