Abstract

The Cancer Genome Atlas (TCGA) project has made available multiple heterogeneous datasets. Although several methodological approaches have been proposed for the heterogeneous data integration, there is no framework of sparse non-negative matrix factorization (NMF) for handling heterogeneous biological data integration. Here, we propose the block-weighted sparse NMF bwsNMF) to identify tumor subtypes of endometrial carcinoma by integrating gene expression, mutations, a protein-protein interaction network and a transcription factor target network.

Highlights

  • Clustering algorithms have been applied to the identification of new subtypes of human cancer

  • We propose the block-weighted sparse negative matrix factorization (NMF) to identify subtypes of endometrial carcinoma by integrating gene expression data, gene mutations, a protein-protein interaction network, and a transcription factor target network

  • We briefly review a type of sparse NMF based on alternating non-negativity-constrained least squares [6] to impose sparseness constrained on basis/metagene matrix: m

Read more

Summary

Introduction

Clustering algorithms have been applied to the identification of new subtypes of human cancer. Clustering methods based on matrix computations, such as non-negative matrix factorization (NMF), can be modified to deal with this complex problem. We will show how the formulation of NMF can be modified for tumor subtype identification with multiple heterogeneous datasets. Our objective is to determine weight parameters when there is no training set that has subtype labels. We use another type of data, i.e. clinical data including survival days of patients, to determine weight parameters after defining the best subtypes consisting of patient groups that show different survival profiles. Our objective is to identify tumor subtypes that maximize survival differences by searching the weight parameter space. We believe that NMF provides a useful mathematical framework to formulate a more complex objective function without losing computational efficiency

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call