Ensemble Modelling for Early Breast Cancer Prediction from Diet and Lifestyle

Brindha Senthilkumar,Lal Hmingliana,Lalawmpuii Pachuau,Saia Chenkual,Doris Zodinpuii,John Zohmingthanga,Nachimuthu Senthil Kumar

doi:10.1016/j.ifacol.2022.04.071

Brindha Senthilkumar, Lal Hmingliana + Show 5 more

Open Access

https://doi.org/10.1016/j.ifacol.2022.04.071

Copy DOI

Abstract

The present work aims to compare and evaluate the performance of five ensemble models in predicting breast cancer using 15 diet and lifestyle factors among the Mizo women. The dataset for developing ensemble learning models contains 148 breast cancer cases and 173 healthy individuals. Learning curves are constructed for five classifiers (AdaBoost, Gradient Boost, extra trees, bagging, and random forest) to find the best fit models for the present dataset. The performances of models are evaluated using 10-fold cross validation (CV), leave one out cross validation (LOOCV), and accuracy. Extra trees classifier outperformed other four ensemble classifiers with an accuracy of 96.3% and 95.5% using LOOCV and CV, respectively. The prediction accuracies of above-two cross validation methods have shown a Pearson correlation index of 97.45% which have strengthened the performances of the ensemble models. Thus, the lack of physical activity, less intake of fruits, vegetables, and water, high consumption of saum (fermented pork fat), smoked meat and vegetables, addiction of chewing pan (beetle leaves with arcane nut), using smokeless tobacco product: sahadh, and tuibur (aqueous tobacco extract) are the possible factors to cause breast cancer in Mizo population.

Full Text