Abstract

Cancer genomic data contain views from different sources that provide complementary information about genetic activity. This provides a new way for cancer research. Feature selection and multi-view clustering are hot topics in bioinformatics, and they can make full use of complementary information to improve the effect. In this paper, a novel integrated model called Multi-view Non-negative Matrix Factorization (MvNMF) is proposed for the selection of common differential genes (co-differential genes) and multi-view clustering. In order to encode the geometric information in the multi-view genomic data, graph regularized MvNMF (GMvNMF) is further proposed by applying the graph regularization constraint in the objective function. GMvNMF can not only obtain the potential shared feature structure and shared cluster group structure, but also capture the manifold structure of multi-view data. The validity of the proposed GMvNMF method was tested in four multi-view genomic data. Experimental results showed that the GMvNMF method has better performance than other representative methods.

Highlights

  • With the rapid development of gene sequencing technology, a large number of multi-view data have been generated

  • In order to effectively utilize the information of multiple views, we proposed the Multi-view Non-negative Matrix Factorization (MvNMF) model and further improved it to get graph regularized MvNMF (GMvNMF)

  • Since the differential genes we selected are genes expressed in gene expression (GE), copy number variation (CNV), and ME, the selected co-differential genes have more important biological significance

Read more

Summary

Introduction

With the rapid development of gene sequencing technology, a large number of multi-view data have been generated. Multi-view data are insightful and have multiple levels of genetic activity information. Exploring this information will provide us with an unprecedented opportunity to discover the molecular mechanisms of cancer [1]. The Cancer Genome Atlas (TCGA) is the largest genome-based platform It provides a large number of different types of omics data. We use gene expression (GE), copy number variation (CNV), and methylation (ME) data of four cancers in the TCGA database. They are mutually dependent on each other [2]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call