Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

Sudipta Acharya,Laizhong Cui,Yi Pan

doi:10.1186/s12859-020-03810-0

Sudipta Acharya, Laizhong Cui + Show 1 more

Open Access

https://doi.org/10.1186/s12859-020-03810-0

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2020
Citations: 4	License type: open-access

Affiliation: Shenzhen University, Georgia State University

Abstract

BackgroundIn recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population.ResultsIn the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets.ConclusionA thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Highlights

In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers
This indicates that genes of the same obtained clusters are more involved in similar biological processes compared to remaining genes in the genome
The obtained gene clusters are validated biologically through Gene ontology (GO) enrichment analysis, and the obtained test outcome is reported in Table 5 for random two clusters from best ensembled solution

Summary

Introduction

To investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. As a popular solution of ‘Curse of dimensionality’ [1], in the past few years, various gene (or feature) selection methods [2,3,4] have been invented by researchers Those methods aim to discard redundant genes from expression data sets and keep only a smaller subset of relevant genes that effectively participate in sample classification. Existing research indicates that genetic markers are highly involved in different cancer pathways; they can be useful for diagnosing and assessing drug efficacy and toxicity

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data
Rupali Bhartiya ... Gend Lal Prajapati
EAI Endorsed Transactions on Pervasive Health and Technology | VOL. 9
Rupali Bhartiya, et. al.Rupali Bhartiya ... Gend Lal Prajapati
19 Sep 2023
EAI Endorsed Transactions on Pervasive Health and Technology | VOL. 9

Decision letter: Applying causal discovery to single-cell analyses using CausalCell
Babak Momeni ... Anna Akhmanova
-
Babak Momeni, et. al.Babak Momeni ... Anna Akhmanova
14 Aug 2022
14 Aug 2022

Author response: Applying causal discovery to single-cell analyses using CausalCell
Jielong Huang ... Yanqing Ding
-
Jielong Huang, et. al.Jielong Huang ... Yanqing Ding
23 Aug 2022
23 Aug 2022

A consensus multi-view multi-objective gene selection approach for improved sample classification
Sudipta Acharya ... Yi Pan
BMC Bioinformatics | VOL. 21
Sudipta Acharya, et. al.Sudipta Acharya ... Yi Pan
01 Sep 2020
BMC Bioinformatics | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics