Abstract

<span lang="EN-US">Statisticians in both academia and industry have encountered problems with high-dimensional data. The rapid feature increase has caused the feature count to outstrip the instance count. There are several established methods when selecting features <span>from massive amounts of breast cancer data. Even so, overfitting continues to be a problem. The challenge of choosing important features with minimum loss in a different sample size is another area with room for development. As a result, the feature selection technique is crucial for dealing with high-dimensional</span> data classification issues. This paper proposed a new architecture for high-dimensional breast cancer data using filtering techniques and a logistic regression model. Essential features are filtered out using a combination of hybrid chi–square and hybrid information gain (hybrid IG) with logistic regression as classifier. The results showed that hybrid IG performed the best for high-dimensional breast and prostate cancer data. The top 50 and 22 features outperformed the other configurations, with the highest classification accuracies of 86.96% and 82.61%, respectively, after integrating the hybrid <span>information gain and logistic function (hybrid IG+LR) with a sample size of 75. In the future, multiclass classification of multidimensional medical data to be evaluated using data from</span> a different domain<span>.</span></span>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call