Addressing Uncertainties in Machine Learning Predictions of Conservation Status

Barnaby Walker,Eimear Nic Lughadha,Steven Bachman,Tarciso Leão,Eve Lucas

doi:10.3897/biss.3.37147

Barnaby Walker, Eimear Nic Lughadha + Show 3 more

Open Access

https://doi.org/10.3897/biss.3.37147

Copy DOI

Abstract

Extinction risk assessments are increasingly important to many stakeholders (Bennun et al. 2017) but there remain large gaps in our knowledge about the status of many species. The IUCN Red List of Threatened Species (IUCN 2019, hereafter Red List) is the most comprehensive assessment of extinction risk. However, it includes assessments of just 7% of all vascular plants, while 18% of all assessed animals lack sufficient data to assign a conservation status. The wide availability of species occurrence information through digitised natural history collections and aggregators such as the Global Biodiversity Information Facility (GBIF), coupled with machine learning methods, provides an opportunity to fill these gaps in our knowledge. Machine learning approaches have already been proposed to guide conservation assessment efforts (Nic Lughadha et al. 2018), assign a conservation status to species with insufficient data for a full assessment (Bland et al. 2014), and predict the number of threatened species across the world (Pelletier et al. 2018). The wide range in sources of species occurrence records can lead to data quality issues, such as missing, imprecise, or mistaken information. These data quality issues may be compounded in databases that aggregate information from multiple sources: many such records derive from field observations (78% for plant species in GBIF; Meyer et al. 2016) largely unsupported by voucher specimens that would allow confirmation or correction of their identification. Even where voucher specimens do exist, different taxonomic or geographic information can be held for a single collection event represented by duplicate specimens deposited in different natural history collections. Tools are available to help clean species occurrence data, but these cannot deal with problems like specimen misidentification, which previous work (Nic Lughadha et al. 2019) has shown to have a large impact on preliminary assessments of conservation status. Machine learning models based on species occurrence records have been reported to predict with high accuracy the conservation status of species. However, given the black-box nature of some of the better machine learning models, it is unclear how well these accuracies apply beyond the data on which the models were trained. Practices for training machine learning models differ between studies, but more interrogation of these models is required if we are to know how much to trust their predictions. To address these problems, we compare predictions made by a machine learning model when trained on specimen occurrence records that have benefitted from minimal or more thorough cleaning, with those based on records from an expert-curated database. We then explore different techniques to interrogate machine learning models and quantify the uncertainty in their predictions.

Highlights

The wide range in sources of species occurrence records can lead to data quality issues, such as missing, imprecise, or mistaken information
These data quality issues may be compounded in databases that aggregate information from multiple sources: many such records derive from field observations (78% for plant species in Global Biodiversity Information Facility (GBIF); Meyer et al 2016) largely unsupported by voucher specimens that would allow confirmation or correction of their identification
Machine learning models based on species occurrence records have been reported to predict with high accuracy the conservation status of species

Summary

Introduction

The wide range in sources of species occurrence records can lead to data quality issues, such as missing, imprecise, or mistaken information. These data quality issues may be compounded in databases that aggregate information from multiple sources: many such records derive from field observations (78% for plant species in GBIF; Meyer et al 2016) largely unsupported by voucher specimens that would allow confirmation or correction of their identification.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Biodiversity Information Science and Standards	Publication Date: Jun 18, 2019
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Addressing Uncertainties in Machine Learning Predictions of Conservation Status

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards

Lead the way for us

Similar Papers

A review of machine learning applications in life cycle assessment studies
Xiaobo Xue Romeiko ... Callie Babbitt
Science of the Total Environment | VOL. 912
Xiaobo Xue Romeiko, et. al.Xiaobo Xue Romeiko ... Callie Babbitt
28 Nov 2023
Science of the Total Environment | VOL. 912

Machine learning to predict adverse outcomes after cardiac surgery: A systematic review and meta-analysis.
Jahan C Penny‐Dimri ... Christoph Bergmeir
Journal of Cardiac Surgery | VOL. 37
Jahan C Penny‐Dimri, et. al.Jahan C Penny‐Dimri ... Christoph Bergmeir
24 Aug 2022
Journal of Cardiac Surgery | VOL. 37

Specimen Identifiers: Linking tissues, DNA samples, and sequence data to voucher specimens in publicly accessible databases
Daniel Mulcahy
Biodiversity Information Science and Standards | VOL. 6
Daniel MulcahyDaniel Mulcahy
09 Sep 2022
Biodiversity Information Science and Standards | VOL. 6

Increasing accuracy: The advantage of using open access species occurrence database in the Red List assessment
Iyan Robiansyah ... Wita Wardani
Biodiversitas Journal of Biological Diversity | VOL. 21
Iyan Robiansyah, et. al.Iyan Robiansyah ... Wita Wardani
22 Jul 2020
Biodiversitas Journal of Biological Diversity | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Addressing Uncertainties in Machine Learning Predictions of Conservation Status

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards