Data imputation and comparison of custom ensemble models with existing libraries like XGBoost, CATBoost, AdaBoost and Scikit learn for predictive equipment failure

Tejas Y Deo,Aditya Sanju

doi:10.1016/j.matpr.2022.09.410

Abstract

This paper presents comparison of a custom ensemble models with the models trained using existing libraries like XGBoost, CATBoost, AdaBoost and Scikit learn, for predictive equipment failure for the case of oil extracting equipment setup. The dataset used in this paper, contains many missing values and the paper proposes different model-based data imputation strategies to impute the missing values. Various machine-learning based pre-processing techniques are discussed to tackle missing values in the data efficiently without harming the data. Custom ensemble models that are proposed in the paper are a combination of a base estimator and a meta classifier. The base estimator is an ensemble model which is trained on the pre-processed data and the outputs derived from the base estimator are used as inputs for the meta classifier. The meta classifier helps in better generalization of data as it helps in pruning the base estimator. The architecture and the training and testing process of the custom ensemble models are explained in detail.

Full Text