Applications of Python to Evaluate the Performance of Bagging Methods

Akhil Kadiyala,Ashok Kumar

doi:10.1002/ep.13016

Abstract

The use of ensemble methods for obtaining scalable solutions on complex multi-dimensional datasets has increased manifold in the field of advanced machine learning and analytics owing to the ensemble method's capabilities of combining multiple base estimators to generate a more robust estimator than any single estimator with a given algorithm. Bagging and boosting are the two widely used ensemble methods. This paper presents a step-by-step approach to the applications of python in evaluating the performance of three bagging ensemble methods, namely, bagging, random forest, and extremely randomized trees for predicting the in-bus carbon dioxide concentrations. The bagging ensemble model evaluation results from this study were compared with the results obtained from a prior study that evaluated the performance of four boosting (gradient boosting machine, light gradient boosting machine, extreme gradient boosting, adaptive boosting) ensemble methods utilizing the same in-bus database. Among the seven ensemble methods, the random forest ensemble method provided better results on the basis of predictive model evaluation with operational performance measures. The readers may adopt the bagging ensemble methods (inclusive of the python coding) discussed in this article to successfully address their own data science problems. © 2018 American Institute of Chemical Engineers Environ Prog, 37: 1555–1559, 2018

Full Text