Abstract

Discovering cancer subtypes is useful for guiding clinical treatment of multiple cancers. Progressive profile technologies for tissue have accumulated diverse types of data. Based on these types of expression data, various computational methods have been proposed to predict cancer subtypes. It is crucial to study how to better integrate these multiple profiles of data. In this paper, we collect multiple profiles of data for five cancers on The Cancer Genome Atlas (TCGA). Then, we construct three similarity kernels for all patients of the same cancer by gene expression, miRNA expression and isoform expression data. We also propose a novel unsupervised multiple kernel fusion method, Similarity Kernel Fusion (SKF), in order to integrate three similarity kernels into one combined kernel. Finally, we make use of spectral clustering on the integrated kernel to predict cancer subtypes. In the experimental results, the P-values from the Cox regression model and survival curve analysis can be used to evaluate the performance of predicted subtypes on three datasets. Our kernel fusion method, SKF, has outstanding performance compared with single kernel and other multiple kernel fusion strategies. It demonstrates that our method can accurately identify more accurate subtypes on various kinds of cancers. Our cancer subtype prediction method can identify essential genes and biomarkers for disease diagnosis and prognosis, and we also discuss the possible side effects of therapies and treatment.

Highlights

  • Cancer is a heterogeneous disease caused by chemical, physical, or genetic factors (Mager, 2006; Liu and Chu, 2014)

  • We propose a novel unsupervised multiple kernel fusion method, Similarity Kernel Fusion (SKF), in order to integrate three similarity kernels into one combined kernel

  • The second dataset is provided in Wang et al (2014), which includes lung cancer, kidney cancer, breast cancer, colon cancer, and glioblastoma multiforme (GBM)

Read more

Summary

INTRODUCTION

Cancer is a heterogeneous disease caused by chemical, physical, or genetic factors (Mager, 2006; Liu and Chu, 2014). Wang et al (2014) proposed the Similarity Network Fusion (SNF) approach for accurately clustering caner subtypes This method first collects three types of genome-wide data including gene, methylation and miRNA expression. It constructs the networks of samples (e.g., patients) by using three types of expression data, and fuses these networks into one network by using SNF representing the full spectrum of underlying data It employs spectral clustering on an integrated network to predict caner subtypes. Shen et al (2009) proposed the iCluster method, which is based on the Gaussian latent variable model, to discover caner subtypes This method was tested on breast cancer and lung cancer by using copy number and gene expression data types. We compare the integrated kernel with the single kernel and other fusion methods, and analyze the survival curve of the clinical data

MATERIALS AND METHODS
Dataset
Similarity Kernel Construction
Similarity Kernel Fusion
Mining Subtypes Using Spectral Clustering
RESULTS
Evaluation Criteria and Verification Method
Parameter Selection for SKF
Performance of SKF in Difference Datasets
Comparing With Other Fusion Methods
Survival Analysis
CONCLUSIONS
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call