Abstract
Recently, with the rapid progress of high-throughput sequencing technology, diverse genomic data are easy to be obtained. To effectively exploit the value of those data, integrative methods are urgently needed. In this paper, based on SNF (Similarity Network Diffusion) [1], we proposed a new integrative method named ndmaSNF (network diffusion model assisted SNF), which can be used for cancer subtype discovery with the advantage of making use of somatic mutation data and other discrete data. Firstly, we incorporate network diffusion model on mutation data to make it smoothed and adaptive. Then, the mutation data along with other data types are utilized in the SNF framework by constructing patient-by-patient similarity networks for each data type. Finally, a fused patient network containing all the information from different input data types is obtained by using a nonlinear iterative method. The fused network can be used for cancer subtype discovery through the clustering algorithm. Experimental results on four cancer datasets showed that our ndmaSNF method can find subtypes with significant differences in the survival profile and other clinical features.
Highlights
Cancer is believed to be a complicated and heterogeneous disease since that it is driven by different combinations of mutated genes rather than the individual gene, and those mutations vary among tumor samples
Great efforts have been made by several large-scale projects such as The Cancer Genome Atlas (TCGA) [2], International Cancer Genome Consortium (ICGC) [3], and Cancer Cell Line Encyclopedia (CCLE) [4], etc., which generated a sea of multiple genomic platform data
For each data type, a sample-by-sample similarity network is constructed using the Euclidean distance and a scaled exponential similarity kernel, these similarity networks are fused into one single network by a nonlinear iterative method
Summary
Cancer is believed to be a complicated and heterogeneous disease since that it is driven by different combinations of mutated genes rather than the individual gene, and those mutations vary among tumor samples. For each data type, a sample-by-sample similarity network is constructed using the Euclidean distance and a scaled exponential similarity kernel, these similarity networks are fused into one single network by a nonlinear iterative method. At last, this fused network is clustered by spectral clustering to receive several tumor www.impactjournals.com/oncotarget groups. In SNF, diverse data such as DNA methylation, mRNA expression and miRNA expression data were used for identification of meaningful cancer subtypes Those data types are with continuous value for which the Euclidean metric is suitable. For discrete data they do propose to use chi-squared distance (Supplementary Note-Chi-squared distance) to calculate the similarity between the patients, by which we cannot get a satisfactory result
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have