Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models

K Belitz,P.E Stackelberg

doi:10.1016/j.envsoft.2021.105006

Abstract

Ensemble-tree machine learning (ML) regression models can be prone to systematic bias: small values are overestimated and large values are underestimated. Additional bias can be introduced if the dependent variable is a transform of the original data. Six methods were evaluated for their ability to correct systematic and introduced bias. Method performance was evaluated using four case studies of groundwater quality: the units of the dependent variable were pH in two and log-concentration in the others. When performance metrics (bias and RMSE for both points and the CDF) were computed using the same units as those in the ML model, empirical distribution matching (EDM) provided the best results. When the metrics were computed using retransformed concentration, EDM and a method incorporating Duan's smearing estimate were both effective. A method based on the Z-score transform approximates EDM if the correlation coefficient between rank-ordered ML estimates and rank-ordered observations approaches one.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Environmental Modelling & Software	Publication Date: Feb 24, 2021
Citations: 52	License type: cc-by

R Discovery Prime

R Discovery Prime

Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models

Abstract

Talk to us

Similar Papers

More From: Environmental Modelling & Software

Lead the way for us

Similar Papers

Comparison of nine machine learning regression models in predicting hospital length of stay for patients admitted to a general medicine department
Addisu Jember Zeleke ... Pierpaolo Palumbo
Informatics in Medicine Unlocked | VOL. 47
Addisu Jember Zeleke, et. al.Addisu Jember Zeleke ... Pierpaolo Palumbo
01 Jan 2024
Informatics in Medicine Unlocked | VOL. 47

Predicting ipsilateral recurrence in women treated for ductal carcinoma in situ using machine learning and multivariable logistic regression models
Leslie R Lamb ... Manisha Bahl
Clinical Imaging | VOL. 92
Leslie R Lamb, et. al.Leslie R Lamb ... Manisha Bahl
16 Sep 2022
Clinical Imaging | VOL. 92

Machine learning and linear regression models to predict catchment‐level base cation weathering rates across the southern Appalachian Mountain region, USA
Nicholas A Povak ... Keith M Reynolds
Water Resources Research | VOL. 50
Nicholas A Povak, et. al.Nicholas A Povak ... Keith M Reynolds
01 Apr 2014
Water Resources Research | VOL. 50

Role for machine learning in sex-specific prediction of successful electrical cardioversion in atrial fibrillation?
Nicklas Vinter ... Gregory Y H Lip
Open Heart | VOL. 7
Nicklas Vinter, et. al.Nicklas Vinter ... Gregory Y H Lip
01 Jun 2020
Open Heart | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models

Abstract

Talk to us

Similar Papers

More From: Environmental Modelling &amp; Software

More From: Environmental Modelling & Software