Predicting classifications in marine biomonitoring with supervised machine learning: how much data is required?

Verena Dully,Tom Wilding,Thorsten Stoeck,Timo Mühlhaus

doi:10.3897/aca.4.e64661

Abstract

Marine coastal ecosystems offer numerous ecosystem services and are therefore subject to a variety of stressors from anthropogenic activities. Environmental biomonitoring programs for effective management and conservation of coastal marine ecosystems are therefore crucial. Traditional monitoring has been based on macrofauna indices which are laborious and require expert knowledge. Recently, eDNA metabarcoding has become increasingly popular as it does not involve macrofauna species identification and is therefore cost and time inexpensive. Studies have shown that ecosystem monitoring based on eDNA metabarcoding is feasible and random forest (RF) algorithms can predict various biological indices, and therefore ecosystem health. To propose adequate designs for future eDNA metabarcoding-based marine coastal monitoring surveys, the aim of the study is to find out (1) What is the lower limit of reads for accurate RF predictions in coastal marine monitoring using microbial communities? (2) Is this limit the same for different monitoring targets? To achieve this goal, we exploited four different Illumina amplicon datasets obtained from bacterial communities in different costal environments. From these datasets, we predicted different objectives relevant for biomonitoring. For each dataset, those corresponding prediction objectives (labels) were predicted using amplicon sequence variants (ASVs) as features. After construction of RF models using all available sequences of a dataset (full model, serving as benchmark for targeted prediction accuracy), we then successively down-sampled each dataset to lower sequence numbers. Prediction accuracies of the reduced models were then compared to the accuracies of the full models to assess the minimum number of features to obtain the targeted prediction accuracy. Our results show that there is no general answer to question (1) and that (2) the limit varies between different monitoring targets. We have identified the most informative criteria that are relevant to assess the sequencing depth required to predict a biomonitoring category using RF. This may guide future study designs and may help to estimate and control costs in applied routine DNA-based biomonitoring using RF to predict the biomonitoring target. In our contribution we will elucidate and discuss these criteria.

Highlights

Studies have shown that ecosystem monitoring based on eDNA metabarcoding is feasible and random forest (RF) algorithms can predict various biological indices, and ecosystem health
To propose adequate designs for future eDNA metabarcoding-based marine coastal monitoring surveys, the aim of the study is to find out (1) What is the lower limit of reads for accurate RF predictions in coastal marine monitoring using microbial communities? (2) Is this limit the same for different monitoring targets? To achieve this goal, we exploited four different Illumina amplicon datasets obtained from bacterial communities in different costal environments
After construction of RF models using all available sequences of a dataset, we successively down-sampled each dataset to lower sequence numbers

Summary

Introduction

Predicting classifications in marine biomonitoring with supervised machine learning: how much data is required? Corresponding author: Verena Dully (vdully@rhrk.uni-kl.de), Thorsten Stoeck (stoeck@rhrk.uni-kl.de) Received: 19 Feb 2021 | Published: 04 Mar 2021 Citation: Dully V, Wilding T, Mühlhaus T, Stoeck T (2021) Predicting classifications in marine biomonitoring with supervised machine learning: how much data is required? Marine coastal ecosystems offer numerous ecosystem services and are subject to a variety of stressors from anthropogenic activities. Environmental biomonitoring programs for effective management and conservation of coastal marine ecosystems are crucial.

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predicting classifications in marine biomonitoring with supervised machine learning: how much data is required?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ARPHA Conference Abstracts

Lead the way for us

Journal: ARPHA Conference Abstracts	Publication Date: Mar 4, 2021
License type: CC BY 4.0

Similar Papers

Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations.
Kleopatra Leontidou ... Thorsten Stoeck
PeerJ | VOL. 11
Kleopatra Leontidou, et. al.Kleopatra Leontidou ... Thorsten Stoeck
13 Jun 2023
PeerJ | VOL. 11

Environmental DNA for biomonitoring.
Jan Pawlowski ... Frédéric Boyer
Molecular Ecology | VOL. 30
Jan Pawlowski, et. al.Jan Pawlowski ... Frédéric Boyer
27 Jun 2021
Molecular Ecology | VOL. 30

Recent advances in environmental DNA‐based biodiversity assessment and conservation
Jun Yang ... Xiaowei Jin
Diversity & distributions | VOL. 27
Jun Yang, et. al.Jun Yang ... Xiaowei Jin
28 Sep 2021
Diversity & distributions | VOL. 27

Coastal rocky reef fish monitoring in the context of the Marine Strategy Framework Directive: Environmental DNA metabarcoding complements underwater visual census
Anaïs Rey ... Pierre Thiriet
Ocean & Coastal Management | VOL. 241
Anaïs Rey, et. al.Anaïs Rey ... Pierre Thiriet
09 Jun 2023
Ocean & Coastal Management | VOL. 241

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting classifications in marine biomonitoring with supervised machine learning: how much data is required?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ARPHA Conference Abstracts