Abstract

Metals are considered to be one of the most hazardous substances due to their potential for accumulation, magnification, persistence, and wide distribution in water, sediments, and aquatic organisms. Demersal fish species, such as turbot (Psetta maxima maeotica), are accepted by the scientific communities as suitable bioindicators of heavy metal pollution in the aquatic environment. The present study uses a machine learning approach, which is based on multiple linear and non-linear models, in order to effectively estimate the concentrations of heavy metals in both turbot muscle and liver tissues. For multiple linear regression (MLR) models, the stepwise method was used, while non-linear models were developed by applying random forest (RF) algorithm. The models were based on data that were provided from scientific literature, attributed to 11 heavy metals (As, Ca, Cd, Cu, Fe, K, Mg, Mn, Na, Ni, Zn) from both muscle and liver tissues of turbot exemplars. Significant MLR models were recorded for Ca, Fe, Mg, and Na in muscle tissue and K, Cu, Zn, and Na in turbot liver tissue. The non-linear tree-based RF prediction models (over 70% prediction accuracy) were identified for As, Cd, Cu, K, Mg, and Zn in muscle tissue and As, Ca, Cd, Mg, and Fe in turbot liver tissue. Both machine learning MLR and non-linear tree-based RF prediction models were identified to be suitable for predicting the heavy metal concentration from both turbot muscle and liver tissues. The models can be used for improving the knowledge and economic efficiency of linked heavy metals food safety and environment pollution studies.

Highlights

  • The present study aims to use machine learning to develop multiple linear and non-linear models, in order to effectively estimate the concentrations of heavy metals in both turbot muscle and liver tissues, based on data that were provided from scientific literature, attributed to a maximum number of 11 elements (As, Ca, Cd, Cu, Fe, K, Mg, Mn, Na, Ni, and Zn)

  • The machine learning multiple linear regression (MLR) and non-linear tree-based random forest (RF) prediction models are identified as being suitable for predicting the heavy metal concentration from both turbot muscle and liver tissues

  • The MLR and RF models both complement each other and form a complete heavy metal analytical framework, as MRL evaluates the interactions between the analyzed heavy metals from turbot muscle and liver and RF models manage to accurately predict the required data

Read more

Summary

A Machine Learning Approach in Analyzing

S, tefan-Mihai Petrea 1, * , Mioara Costache 2, *, Dragos, Cristea 3 , S, tefan-Adrian Strungaru 4 , Ira-Adeline Simionov 1,5 , Alina Mogodan 1 , Lacramioara Oprica 6 and Victor Cristea 1,5. The Fish Culture Research and Development Station of Nucet, 137335 Dâmbovit, a-Nucet, Romania. Multidisciplinary Research Platform (ReForm), University “Dunărea de Jos” of Galat, i, 800008 Galat, i, Romania. Academic Editors: Giuseppe Scarponi, Silvia Illuminati, Anna Annibaldi and Cristina Truzzi

Introduction
Results and Discussion
Positive Significant Correlations
Negative Significant Correlations
Predictive Models
The First Group MLR Models
The First Group Non-Linear Tree-Based RF Prediction Models
The Second Group Non-Linear Tree-Based RF Prediction Models
The Third Group MLR Models
The Fourth Group Non-Linear Tree-Based RF Prediction Models
The Fifth Group MLR Models
Feature Importance Overview
Analytical Framework Methods
Dataset Descriptive Statistics
Conclusions
Prediction
Python
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call