Abstract

BackgroundIn recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population.ResultsIn the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets.ConclusionA thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Highlights

  • In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers

  • This indicates that genes of the same obtained clusters are more involved in similar biological processes compared to remaining genes in the genome

  • The obtained gene clusters are validated biologically through Gene ontology (GO) enrichment analysis, and the obtained test outcome is reported in Table 5 for random two clusters from best ensembled solution

Read more

Summary

Introduction

To investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. As a popular solution of ‘Curse of dimensionality’ [1], in the past few years, various gene (or feature) selection methods [2,3,4] have been invented by researchers Those methods aim to discard redundant genes from expression data sets and keep only a smaller subset of relevant genes that effectively participate in sample classification. Existing research indicates that genetic markers are highly involved in different cancer pathways; they can be useful for diagnosing and assessing drug efficacy and toxicity

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call