Abstract

Breast cancer is a major health issue for women all over the world. Effective care and better patient outcomes depend on early identification and precise risk prediction. Ensemble learning algorithms and merging large data have recently emerged as potential ways to predict and classify breast cancer risk. This paper aims to investigate the feasibility of using ensemble learning and big data fusion for risk prediction and categorization of breast cancer. The key focus is building a reliable and precise model by combining the best features of various learning algorithms and a variety of vast and varied data sets. This research presents an Improved XGBoost Ensembling (I-XGBoost) technique with big data analytics to predict and diagnose breast cancer cells. The proposed work considers three critical identification phases to achieve greater accuracy: data pre-processing, feature extraction, and target role. The Wisconsin breast cancer diagnostic data was used for testing the proposed method. It is also compared in terms of performance with other classification methods like Decision Tree, Random Forest, Naive Bayes, K-Nearest Neighbors, Support Vector Machines, Adaboost, and XGBoost. This research aims to determine which characteristics are most beneficial in predicting cancer as malignant or benign. We show that I-XGBoost has an impressively high accuracy score of 99.84% using Spark’s Python Application Programming Interface (API). This research emphasizes the promise of ensemble learning and big data fusion for predicting and classifying breast cancer risks. By proposing a new method that draws on the best features of several models and uses the wealth of information available in big data, it adds to the current body of knowledge. Positive findings highlight the practical use of the suggested paradigm in clinical contexts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call