Abstract

The feature selection process plays an important role in different fields, particularly in bioinformatics and microarray gene expression data analysis, for choosing discriminative genes from high-dimensional datasets and selecting a subset of highly relevant features with low redundancy that may lead to build improved prediction models. Consequently, this study proposes a new feature selection method that integrates Preordonnances theory in terms of new Relevance and Complementarity criteria introduced here and also connectivity in undirected Weighted Graphs (PCRWG). The method can handle high-dimensional data. PCRWG retains the relevant and complementary features to select effective features in large scale gene datasets. The proposed algorithm operates in two phases: filtering and wrapping. The strength of the first phase is that it is preceded by a step that further reduces the number of predictors by removing those in disagreement with the target based on the new proposed relevance criterion. Then, the proposed heuristic uses the relevance-complementarity ratio between preordonnances to automatically update the compromise rule between relevance and complementarity. In the wrapping phase, the suggested graph-based approach using maximal clique is based on a powerful relevance-complementarity matrix to consolidate edges, two connected interdependent features are complementary to each other, and it is possible to have high discriminative power when they serve as a group. We highlight the fact that existing graph-based feature selection algorithms do not consider relevance and complementarity simultaneously. The experiments were carried out on three simulated scenarios and the thirteen most popular cancer microarray gene datasets. Formally, they are eight binary and five multi-class microarray data. A 10-fold cross validation was used to evaluate the Support Vector Machine (SVM), Naive Bayes (NB) and artificial Neural Network (NN) classifiers. The empirical results demonstrate the high performance of the proposed hybrid approach when compared to the most recently published articles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call