Abstract

Benthic sediment toxicity is linked to harmful effects in marine organisms and humans, and an understanding of the link would require, in part, a comprehensive and exhaustive analysis of sediment toxicity data already in hand. One tool which could aid in the process is machine learning (ML), a supervised classification modeling technique that has transformed how actionable insight are acquired from large datasets. The current study is a test of concept in which an ML classifier is sought that can accurately extrapolate the characteristics of a 5437 California-wide coastal training dataset (assembled from 1635 samples) to predict sediment toxicity in southern California bight (SCB). Twelve classifiers were trained to recognize sediment toxicity using 70 % of the dataset and among them, a Gradient Boosting Classifier (GBC) model using latitude, longitude, and water depth was found to be the most accurate at predicting toxicity (83 %). Among the variables, latitude was found to be the most significant driver of prediction by GBC in this test ecosystem. The performance of the model was verified with the remaining 30 % of the dataset and found to be 83 % accurate. Presented with 884 unfamiliar data points assembled from 854 measurements at 346 stations across SCB, GBC was 87 % accurate post-training, thus demonstrating a role supervised learning can play in the southern California environmental analytics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call