Abstract

Advancements in medical technology have created numerous large datasets including many features. Usually, all captured features are not necessary, and there are redundant and irrelevant features, which reduce the performance of algorithms. To tackle this challenge, many metaheuristic algorithms are used to select effective features. However, most of them are not effective and scalable enough to select effective features from large medical datasets as well as small ones. Therefore, in this paper, a binary moth-flame optimization (B-MFO) is proposed to select effective features from small and large medical datasets. Three categories of B-MFO were developed using S-shaped, V-shaped, and U-shaped transfer functions to convert the canonical MFO from continuous to binary. These categories of B-MFO were evaluated on seven medical datasets and the results were compared with four well-known binary metaheuristic optimization algorithms: BPSO, bGWO, BDA, and BSSA. In addition, the convergence behavior of the B-MFO and comparative algorithms were assessed, and the results were statistically analyzed using the Friedman test. The experimental results demonstrate a superior performance of B-MFO in solving the feature selection problem for different medical datasets compared to other comparative algorithms.

Highlights

  • Nowadays, with advances in science and medical technology, numerous large medical datasets including many features have been created, which contain redundant and irrelevant features

  • The proposed binary moth-flame optimization (B-moth-flame optimization (MFO)) was compared with comparative algorithms using various metrics consisting of average accuracy, the standard deviation of accuracy, average fitness, the standard deviation of fitness, and the average number of selected features

  • The performance of the k-nearest neighbor (k-NN) classifier was measured using sensitivity and specificity derived from the confusion matrix, which includes the information about actual and predicted classifications given by the classifier

Read more

Summary

Introduction

With advances in science and medical technology, numerous large medical datasets including many features have been created, which contain redundant and irrelevant features. Data-driven decision making in high-risk diseases such as heart diseases [1] is a significant trend in which many data mining and machine learning methods are introduced [2]. Since medical data are obtained from multiple sources, all captured features are not necessary and some of them are irrelevant and redundant, which may reduce algorithms’ performance in the data-driven decision-maker software. The FSBRR algorithm [3] removes the feature of radius in the Breast Cancer Wisconsin Dataset as a redundant feature because its correlation is very high with feature of smoothness. Feature selection is used in a variety of real-world applications such as disease diagnosis [4,5], email spam detection [6], text clustering [7,8], and human activity recognition [9]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.