A consensus multi-view multi-objective gene selection approach for improved sample classification

Sudipta Acharya,Laizhong Cui,Yi Pan

doi:10.1186/s12859-020-03681-5

Sudipta Acharya, Laizhong Cui + Show 1 more

Open Access

https://doi.org/10.1186/s12859-020-03681-5

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Sep 1, 2020
Citations: 4	License type: open-access

Affiliation: Shenzhen University, Georgia State University

Abstract

BackgroundIn the field of computational biology, analyzing complex data helps to extract relevant biological information. Sample classification of gene expression data is one such popular bio-data analysis technique. However, the presence of a large number of irrelevant/redundant genes in expression data makes a sample classification algorithm working inefficiently. Feature selection is one such high-dimensionality reduction technique that helps to maximize the effectiveness of any sample classification algorithm. Recent advances in biotechnology have improved the biological data to include multi-modal or multiple views. Different ‘omics’ resources capture various equally important biological properties of entities. However, most of the existing feature selection methodologies are biased towards considering only one out of multiple biological resources. Consequently, some crucial aspects of available biological knowledge may get ignored, which could further improve feature selection efficiency.ResultsIn this present work, we have proposed a Consensus Multi-View Multi-objective Clustering-based feature selection algorithm called CMVMC. Three controlled genomic and proteomic resources like gene expression, Gene Ontology (GO), and protein-protein interaction network (PPIN) are utilized to build two independent views. The concept of multi-objective consensus clustering has been applied within our proposed gene selection method to satisfy both incorporated views. Gene expression data sets of Multiple tissues and Yeast from two different organisms (Homo Sapiens and Saccharomyces cerevisiae, respectively) are chosen for experimental purposes. As the end-product of CMVMC, a reduced set of relevant and non-redundant genes are found for each chosen data set. These genes finally participate in an effective sample classification.ConclusionsThe experimental study on chosen data sets shows that our proposed feature-selection method improves the sample classification accuracy and reduces the gene-space up to a significant level. In the case of Multiple Tissues data set, CMVMC reduces the number of genes (features) from 5565 to 41, with 92.73% of sample classification accuracy. For Yeast data set, the number of genes got reduced to 10 from 2884, with 95.84% sample classification accuracy. Two internal cluster validity indices - Silhouette and Davies-Bouldin (DB) and one external validity index Classification Accuracy (CA) are chosen for comparative study. Reported results are further validated through well-known biological significance test and visualization tool.

Highlights

In the field of computational biology, analyzing complex data helps to extract relevant biological information
The experimental study on chosen data sets shows that our proposed feature-selection method improves the sample classification accuracy and reduces the gene-space up to a significant level
In the case of Multiple Tissues data set, Consensus Multi-View Multi-objective Clustering (CMVMC) reduces the number of genes from 5565 to 41, with 92.73% of sample classification accuracy

Summary

Introduction

In the field of computational biology, analyzing complex data helps to extract relevant biological information. The presence of a large number of irrelevant/redundant genes in expression data makes a sample classification algorithm working inefficiently. Feature selection is one such high-dimensionality reduction technique that helps to maximize the effectiveness of any sample classification algorithm. The high dimensionality of gene expression data set causes the data-analysis techniques working inefficiently. In the past several years, researchers have come up with dimensionality reduction methods following different strategies [1,2,3,4] It can be done in two ways; 1) Feature extraction: which combines different available features and creates a new feature and, 2) Feature selection: which eliminates irrelevant features and keeps a smaller subset of available features. Genes of expression data sets are treated as features; throughout the paper, we will use the term ‘gene selection’ and ‘feature selection’ alternatively

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A consensus multi-view multi-objective gene selection approach for improved sample classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A Refined 3-in-1 Fused Protein Similarity Measure: Application in Threshold-Free Hub Detection.
Sudipta Acharya ... Laizhong Cui
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 19
Sudipta Acharya, et. al.Sudipta Acharya ... Laizhong Cui
13 Feb 2020
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 19

A kernel semi-supervised distance metric learning with relative distance: Integration with a MOO approach
Rakesh Kumar Sanodiya ... Jimson Mathew
Expert Systems With Applications | VOL. 125
Rakesh Kumar Sanodiya, et. al.Rakesh Kumar Sanodiya ... Jimson Mathew
18 Jan 2019
Expert Systems With Applications | VOL. 125

A Review on Feature Selection Techniques for Gene Expression Data
S Vanjimalar ... P Manikandan
-
S Vanjimalar, et. al.S Vanjimalar ... P Manikandan
01 Dec 2018
01 Dec 2018

VizCluster and its Application on Classifying Gene Expression Data
Li Zhang ... Murali Ramanathan
Distributed and Parallel Databases | VOL. 13
Li Zhang, et. al.Li Zhang ... Murali Ramanathan
01 Jan 2003
Distributed and Parallel Databases | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A consensus multi-view multi-objective gene selection approach for improved sample classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics