Abstract

Use of machine learning (ML) and data mining (DM) algorithms has surfaced more often in the recent years for disease risk prediction problems in the healthcare communities. Several traditional feature selection models are combined with the DM and ML algorithms to improve accuracy of the disease risk prediction. In this study, a new Bio-inspired Ensemble Feature Selection (BEFS) model is introduced which is applied with the DM and ML algorithms. In the BEFS model, the most relevant and highly contributing features in the prediction are determined with a bio-inspired algorithm i.e., genetic algorithm, and an ensemble algorithm i.e., random forest algorithm. These important features obtained from the proposed model are then combined in various combinations and applied with the DM and ML algorithms, here logistic regression (LR) and random forest (RF), and the results obtained are promising. The experiment is executed using the famous ML language R. To accomplish this objective, the Breast Cancer Wisconsin (Diagnostic) dataset of UCI (University of California, Irvine) ML repository is utilized. In the experimental outcomes, the highest accuracy attained with the BEFS model is 96.49%, the AUC (Area Under Curve) achieved is 96%, and the sensitivity is 98.11%. These results, which greatly improve the disease risk prediction, are higher than several other existing works, while utilizing only six most relevant features out of the thirty two features of the dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call