Identification of potential biomarkers on microarray data using distributed gene selection approach

Alok Kumar Shukla,Diwakar Tripathi

doi:10.1016/j.mbs.2019.108230

Abstract

In recent times, several feature selection (FS) methods have introduced to identify the biomarkers from gene expression datasets. It has gained extensive attention to solve cancer classification problem, but they have some limitations. First, the majority of FS approaches increases the computational cost due to the centralized data structure. Second, an irrelevant ranked gene that could perform well regarding classification accuracy with suitable subset of genes will be left out of the selection. To resolve these problems, we introduce a novel two-stage FS approach by combining Spearman's Correlation (SC) and distributed filter FS methods which can select the highly discriminative genes for distinguishing samples from high dimensional datasets. Concerning distributed FS, data is distributed by features according to vertical distribution and then performs a merging procedure which updates the feature subset along with improved classification accuracy. Moreover, it is used to quantify the relation between gene-gene and the gene-class and simultaneously detect subsets of essential genes. The proposed method is verified on six gene datasets with the help of four well-known classifiers namely, support vector machine, naïve Bayes, k-nearest neighbor, and decision tree. The performance of the proposed method is compared with traditional filter techniques such as Relief-F, Information gain, minimum redundancy maximum relevance, joint mutual information, Chi-square, and t-test. The experimental results demonstrate that the proposed method has significantly improved the performance regarding computational time and classification accuracy in comparison to standard algorithms when applied to the non-partitioned dataset.

Full Text