Occam's razor in dimension reduction: Using reduced row Echelon form for finding linear independent features in high dimensional microarray datasets

Mohammad Kazem Ebrahimpour,Masoumeh Zare,Mahdi Eftekhari,Gholamreza Aghamolaei

doi:10.1016/j.engappai.2017.04.006

Mohammad Kazem Ebrahimpour, Masoumeh Zare + Show 2 more

https://doi.org/10.1016/j.engappai.2017.04.006

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Microarray high dimensional datasets suffer from small sample size and extreme large number of features. Therefore, feature selection plays crucial roles on the performance of the trained models on those datasets. A typical feature selection method consists of two main parts, problem criterion and a search strategy. The common datasets don’t have huge number of features with respect to their number of samples; hence, a search strategy in their feature selection methods were able to seek the search space. In contrast, microarray high dimensional datasets have huge number of features; therefore, their search space is very large and searching that space is a prohibitive action. In this paper, we take into account the philosophy of Occam's razor in feature subset selection in order to release high dimensional datasets from computational search methods. The proposed method uses two stages for feature selection. In the first stage features are rearranged by their importance in the dataset and in the second stage, the fundamental concept of reduced row Echelon form is applied on dataset in order to find linear independent features. For determining the effectiveness of the proposed method some experiments are carried out on nine binary microarray high dimensional datasets. The obtained results are compared with eleven state-of-the-art feature selection algorithms including Correlation based Feature Selection (CFS), Fast Correlation Based Filter (FCBF), Interact (INT) and Maximum Relevancy Minimum Redundancy (MRMR). The average outcomes of the results are analyzed by a statistical non-parametric test and it reveals that the proposed method has a meaningful superiority to the others in terms of accuracy, sensitivity, specificity, G-mean, number of selected features and computational complexity.

Full Text