Abstract

As soil heavy metal pollution is increasing year by year, the risk assessment of soil heavy metal pollution is gradually gaining attention. Soil heavy metal datasets are usually imbalanced datasets in which most of the samples are safe samples that are not contaminated with heavy metals. Random Forest (RF) has strong generalization ability and is not easy to overfit. In this paper, we improve the Bagging algorithm and simple voting method of RF. A W-RF algorithm based on adaptive Bagging and weighted voting is proposed to improve the classification performance of RF on imbalanced datasets. Adaptive Bagging enables trees in RF to learn information from the positive samples, and weighted voting method enables trees with superior performance to have higher voting weights. Experiments were conducted using G-mean, recall and F1-score to set weights, and the results obtained were better than RF. Risk assessment experiments were conducted using W-RF on the heavy metal dataset from agricultural fields around Wuhan. The experimental results show that the RW-RF algorithm, which use recall to calculate the classifier weights, has the best classification performance. At the end of this paper, we optimized the hyperparameters of the RW-RF algorithm by a Bayesian optimization algorithm. We use G-mean as the objective function to obtain the optimal hyperparameter combination within the number of iterations.

Highlights

  • The accumulation of heavy metals in agricultural land will affect the quality of crops, and the quality of life of the surrounding residents

  • If the number is less than pos num, bootstrap sampling is performed on the positive samples in the input training set so that the number of subtraining sets is equal to pos num

  • Regardless of whether the weight is set by G mean or F1 score, the five metrics obtained by W-Random Forest (RF) in this experiment are better than RF and other algorithms

Read more

Summary

Introduction

The accumulation of heavy metals in agricultural land will affect the quality of crops, and the quality of life of the surrounding residents. Soil is a non-renewable resource, and monitoring heavy metal content in soil and conducting risk assessment is a very important part of land management work. We propose to apply RF to soil heavy metal contamination risk assessment. RF was first proposed by Breiman in 2001 [1], an integrated learning method based on the Bagging algorithm [2]. RF performs well on most classification or regression problems, its hyperparameters are always a challenge to tune. Bayesian optimization (BO) [6] can handle optimization problems with black-box functions, so it is widely used for hyperparameter tuning of integrated learning methods. At the end of this paper, we performed hyperparameter tuning of the RF using BO.

Bagging
RSM and CART Decision Tree
Bayesian Optimization
Adaptive Bagging
Weighted Voting
W-RF Algorithm
Dataset
Performance Comparison
Parameter Optimization
Findings
Conclusion and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.