Abstract
In recent years, healthcare data has been growing exponentially. The major challenge is to predict and analyze all this data effectively. Feature selection is a solution in which a subset of informative features is selected from a high-dimensional dataset. Feature selection helps to increase accuracy and remove irrelevant features. In the medical domain, selecting important features for healthcare is essential as it directly affects human health. Several filters, wrapper, and embedded feature selection techniques will be examined in this study including generic univariate selects, select percentile, select k best, Pearson correlation coefficient, mutual information, relief-f, recursive feature elimination, recursive feature elimination with cross-validation, sequential forward selection, sequential backward selection, and select-from-model. The aim is to make the healthcare predictions model named classification and regression tree more accurate by employing feature selection methods, to accurately detect breast cancer in its early stages, where the data is collected from Sebha oncology center in the south of Libya. The performance of the classification and regression tree was seen to be noticeably enhanced when eliminated irrelevant features. Later, our model outperforms other classification methods, namely: logistic regression, naive Bayes, and K-nearest neighbors, by using the optimal subset of features identified by recursive feature elimination.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have