Abstract

High dimensional datasets usually suffer from curse of dimensionality which may increase the classification time and decrease the classification accuracy beyond a certain dimensionality. Thus, feature selection is used to discard redundant features for improving classification. Nonetheless, there is not a single feature selection method which could deal with all datasets. Thus, this paper proposes an automatic hybrid feature selection incorporating both filter and wrapper methods called Extended Mutual Congestion-Discrete Weighted Evolution Strategy (EMC-DWES). First, Extended Mutual Congestion (EMC) is proposed as a frequency-based filter ranker to discard irrelevant and redundant features using intrinsic statistics of features. Second, Discrete Weighted Evolution Strategy (DWES) is applied on the remaining features selected by EMC to perform the final automatic feature selection within a wrapper method. DWES clusters the features and applies mutation both to select the most relevant feature in each cluster at a time and to avoid selecting redundant features simultaneously through assigning greater weights to most informative clusters. The performance of EMC-DWES (in maximizing classification accuracy and minimizing the selected subset length) is investigated using benchmark high dimensional medical datasets including Covid-19. Likewise, the superiority of EMC-DWES in comparison with state-of-the-art is also evaluated in all datasets. The implementation of EMC-DWES is available on https://github.com/KhaosResearch/EMC-DWES.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call