Abstract

In this paper, a new integrated scheme is proposed to accurately predict breast cancer, help doctors make early diagnosis and treatment plans, and improve the prognosis of patients. We selects five mainstream machine learning models: support vector machine (SVM), artificial neural network (ANN), random forest (RF), extreme gradient boosting (XGBoost) and adaptive boosting (AdaBoost). The Wisconsin Breast Cancer Database (WBCD) and Wisconsin Diagnostic Breast Cancer (WDBC) are used as datasets to investigate the predictive performance of these single and ensemble models. Then, we use multiple linear regression method for feature selection (FS), the experimental results show that the change of feature subset will significantly affect the performance of the model. The recall and f1-score of the five models are improved by 1.19% and 0.84% on average. After that, we apply whale optimization algorithm (WOA) to optimize the hyperparameters of the model to improve their prediction performance. In the best-case scenario, the model demonstrated improvements of 1.02% in accuracy and 1.82% in precision. In addition, we ensemble these models by stacking, investigate the performance changes of the ensemble model when different models are used as meta learners. Finally, the FS-WOA-Stacking model achieves 99.56% accuracy on WBCD and 99.65% accuracy on WDBC. Compared with the existing breast cancer prediction models, the performance of the proposed model is at an excellent level.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call