Identification of Tumor Subtypes of Endometrial Carcinoma by Integration of Heterogeneous Datasets

Ho Kim ,Markus Bredel

doi:10.4172/2168-9784.1000189

Abstract

The Cancer Genome Atlas (TCGA) project has made available multiple heterogeneous datasets. Although several methodological approaches have been proposed for the heterogeneous data integration, there is no framework of sparse non-negative matrix factorization (NMF) for handling heterogeneous biological data integration. Here, we propose the block-weighted sparse NMF bwsNMF) to identify tumor subtypes of endometrial carcinoma by integrating gene expression, mutations, a protein-protein interaction network and a transcription factor target network.

Highlights

Clustering algorithms have been applied to the identification of new subtypes of human cancer
We propose the block-weighted sparse negative matrix factorization (NMF) to identify subtypes of endometrial carcinoma by integrating gene expression data, gene mutations, a protein-protein interaction network, and a transcription factor target network
We briefly review a type of sparse NMF based on alternating non-negativity-constrained least squares [6] to impose sparseness constrained on basis/metagene matrix: m

Summary

Introduction

Clustering algorithms have been applied to the identification of new subtypes of human cancer. Clustering methods based on matrix computations, such as non-negative matrix factorization (NMF), can be modified to deal with this complex problem. We will show how the formulation of NMF can be modified for tumor subtype identification with multiple heterogeneous datasets. Our objective is to determine weight parameters when there is no training set that has subtype labels. We use another type of data, i.e. clinical data including survival days of patients, to determine weight parameters after defining the best subtypes consisting of patient groups that show different survival profiles. Our objective is to identify tumor subtypes that maximize survival differences by searching the weight parameter space. We believe that NMF provides a useful mathematical framework to formulate a more complex objective function without losing computational efficiency

Objectives

Results

Conclusion