Abstract

The present work aims to compare and evaluate the performance of five ensemble models in predicting breast cancer using 15 diet and lifestyle factors among the Mizo women. The dataset for developing ensemble learning models contains 148 breast cancer cases and 173 healthy individuals. Learning curves are constructed for five classifiers (AdaBoost, Gradient Boost, extra trees, bagging, and random forest) to find the best fit models for the present dataset. The performances of models are evaluated using 10-fold cross validation (CV), leave one out cross validation (LOOCV), and accuracy. Extra trees classifier outperformed other four ensemble classifiers with an accuracy of 96.3% and 95.5% using LOOCV and CV, respectively. The prediction accuracies of above-two cross validation methods have shown a Pearson correlation index of 97.45% which have strengthened the performances of the ensemble models. Thus, the lack of physical activity, less intake of fruits, vegetables, and water, high consumption of saum (fermented pork fat), smoked meat and vegetables, addiction of chewing pan (beetle leaves with arcane nut), using smokeless tobacco product: sahadh, and tuibur (aqueous tobacco extract) are the possible factors to cause breast cancer in Mizo population.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call