Abstract

The goal of this research is to create a machine learning (ML) classifier that can improve breast cancer (BC) diagnosis and prediction. The principle components analysis (PCA) technique is used in this work to minimize the dimensions of the BC dataset and achieve better classification metrics. The developed classifier outperformed others in terms of F1 score and accuracy score. Using the original BC dataset, four different classifiers are applied to determine the best classifier in terms of performance metrics. The used classifiers were RandomForest, DecisionTree, AdaBoost, and GradientBoosting. The RandomForest classifier obtained (95.7%) f1 score and (94.5%) accuracy score, the DecisionTree classifier obtained (93%) f1 score and (91%) accuracy score, the GradientBoosting classifier obtained (95%) f1 score and (93.5%) accuracy score, and the AdaBoost classifier obtained (95.8%) f1 score and (94.5%). The AdaBoost classifier was utilized to create the final model using the reduced PCA dataset because it scored the highest performance metrics. The developed classifier is named as “pcaAdaBoost”. The optimized pcaAdaBoost achieved higher performance metrics in terms of f1 score (99%) and accuracy score (98.8%). The results reveal that the optimized pcaAdaBoost scored highest performance measures in terms of cross-validation and testing outcomes, with an overall accuracy of (99%). The improved results justify the use of dimensionality reduction in high-dimension datasets to reduce complexity and improve performance measures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.