Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.

Denis A Shah,Erick D De Wolf,Pierce A Paul,Laurence V Madden

doi:10.1371/journal.pcbi.1008831

Denis A Shah, Erick D De Wolf + Show 2 more

Open Access

https://doi.org/10.1371/journal.pcbi.1008831

Copy DOI

Abstract

Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk.

Highlights

IntroductionWe naturally seek different opinions. Translated to prediction, this means consulting different models, each of which makes predictions with a level of uncertainty, inasmuch that any model only approximates the truth
When making important decisions, we naturally seek different opinions
Ensembling takes a set of predictions from individual models and combines them such that the performance of the ensemble is ideally better than that of any one of the constituent models

Summary

Introduction

We naturally seek different opinions. Translated to prediction, this means consulting different models, each of which makes predictions with a level of uncertainty, inasmuch that any model only approximates the truth. The individual models (base learners) in an ensemble ideally should exhibit low correlations when their predictions are compared [1,10], as this enables the higher-level ensembling algorithm (the meta-learner) to find a combination of those predictions that improves upon the prediction made by any one base learner model. Put another way, ensembling requires the base learners to make different errors on the observations [10]. If the base learners are highly correlated (i.e., make very similar predictions on the same observations) the theory suggests that the benefits of ensembling are negated

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Computational Biology	Publication Date: Mar 15, 2021
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Fusarium head blight of barley in China
Thin Meiw Choo
Canadian Journal of Plant Pathology | VOL. 31
Thin Meiw ChooThin Meiw Choo
01 Mar 2009
Canadian Journal of Plant Pathology | VOL. 31

Fusarium Head Blight Epidemics in Soft Red Winter Wheat Fields in Georgia from 2018 to 2019
Bikash Ghimire ... Mohamed Mergoum
Plant Health Progress | VOL. 23
Bikash Ghimire, et. al.Bikash Ghimire ... Mohamed Mergoum
01 Jan 2021
Fusarium Head Blight Epidemics in Soft Red Winter Wheat Fields in Georgia from 2018 to 2019
Bikash Ghimire ... Mohamed Mergoum

Predicting Fusarium Head Blight Epidemics with Boosted Regression Trees
D A Shah ... P A Paul
Phytopathology® | VOL. 104
D A Shah, et. al.D A Shah ... P A Paul
01 Jul 2014
Phytopathology® | VOL. 104

Into the Trees: Random Forests for Predicting Fusarium Head Blight Epidemics of Wheat in the United States.
Denis A Shah ... Erick D De Wolf
Phytopathology® | VOL. 113
Denis A Shah, et. al.Denis A Shah ... Erick D De Wolf
01 Aug 2023
Phytopathology® | VOL. 113

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology