Abstract

Breast cancer (BC) is a multifactorial disease and is one of the most common cancers globally. This study aimed to compare different machine learning (ML) techniques to develop a comprehensive breast cancer risk prediction model based on features of various factors. The population sample contained 810 records (115 cancer patients and 695 healthy individuals). 45 attributes out of 85 were selected based on the opinion of experts. These selected attributes are in genetic, biochemical, biomarker, gender, demographic and pathological factors. 13 Machine learning models were trained with proposed attributes and coefficient of attributes and internal relationships were calculated. Compared to other methods random forest (RF) has higher performance (accuracy 99.26%, precision 99%,and area under the curve (AUC) 99%). The results of assessing the impact and correlation of variables using the RF method based on PCA indicated that pathology, biomarker,biochemistry, gene, and demographic factors with a coefficient of 0.35, 0.23, 0.15, 0.14, and 0.13 respectively, affected the risk of BC (r2 = 0.54). Breast cancer has several risk factors. Medical experts use these risk factors for early diagnosis. Therefore, identifying related risk factors and their effect can increase the accuracy of diagnosis. Considering the broad features for predicting breast cancer leads to the development of a comprehensive prediction model. In this study, using RF technique a breast cancer prediction model with 99.3% accuracy was developed based on multifactorial features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call