Abstract

High-dimensional biomedical datasets contain thousands of features used in molecular disease diagnosis, however many irrelevant or weak correlation features influence the predictive accuracy. Feature selection algorithms enable classification techniques to accurately identify patterns in the features and find a feature subset from an original set of features without reducing the predictive classification accuracy while reducing the computational overhead in data mining. In this paper we present an improved shuffled frog leaping algorithm (ISFLA) which explores the space of possible subsets to obtain the set of features that maximizes the predictive accuracy and minimizes irrelevant features in high-dimensional biomedical data. Evaluation employs the K-nearest neighbour approach and a comparative analysis with a genetic algorithm, particle swarm optimization and the shuffled frog leaping algorithm shows that our improved algorithm achieves improvements in the identification of relevant subsets and in classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call