Abstract

AbstractMachine learning algorithms are widely applied to biomedical data to classify the samples of patients and healthy persons. The high‐dimensional biomedical datasets contain a large number of features to represent a sample. However, such datasets may have redundant, noisy and irrelevant features, influencing machine learning algorithms' classification performance and increasing computation overhead. Therefore, data normalization and feature selection techniques are introduced to reduce the impact of noisy features and accurately identify the patterns in features to improve the predictive accuracy of diagnosis. This work proposes an efficient feature selection and parameter optimization method to classify high‐dimensional biomedical datasets. In the proposed method, a binary version of the improved elephant herding optimization (IEHO) algorithm is introduced to select features and optimize the C and γ parameters of the support vector machine classifier. Further, four variants of the proposed method are presented based on data normalization techniques: Z‐score normalization (ZN), Pareto‐scaling (PS), tan h‐based normalization (TN), and variant of tan h‐based normalization (VTN). The proposed variants reduce the dominance of noisy features and explore the feature space to obtain the optimal feature set that maximizes the classification accuracy and minimizes the time complexity. The performance of the proposed variants is evaluated on 15 high‐dimensional biomedical datasets. Friedman's mean rank test is applied to check the statistical difference between proposed variants. Results show that the proposed Z‐score normalization‐IEHO (ZN‐IEHO) variant performed significantly better than the other proposed variants for classification accuracy, false‐positive rate and f‐score metrics. Moreover, the performance of the proposed ZN‐IEHO variant is compared with 18 state‐of‐the‐art feature selection methods. The experimental results expressed the effectiveness of the proposed ZN‐IEHO variant in finding the best combination of features and parameters to classify the biomedical datasets accurately.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call