Abstract
Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk.
Highlights
IntroductionWe naturally seek different opinions. Translated to prediction, this means consulting different models, each of which makes predictions with a level of uncertainty, inasmuch that any model only approximates the truth
When making important decisions, we naturally seek different opinions
Ensembling takes a set of predictions from individual models and combines them such that the performance of the ensemble is ideally better than that of any one of the constituent models
Summary
We naturally seek different opinions. Translated to prediction, this means consulting different models, each of which makes predictions with a level of uncertainty, inasmuch that any model only approximates the truth. The individual models (base learners) in an ensemble ideally should exhibit low correlations when their predictions are compared [1,10], as this enables the higher-level ensembling algorithm (the meta-learner) to find a combination of those predictions that improves upon the prediction made by any one base learner model. Put another way, ensembling requires the base learners to make different errors on the observations [10]. If the base learners are highly correlated (i.e., make very similar predictions on the same observations) the theory suggests that the benefits of ensembling are negated
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.