Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification.

Jie Feng,Lan Wen,Limin Jiang,Jijun Tang,Shuhao Li

doi:10.3389/fgene.2021.647141

Jie Feng, Lan Wen + Show 3 more

Open Access

https://doi.org/10.3389/fgene.2021.647141

Copy DOI

Abstract

The multiple sources of cancer determine its multiple causes, and the same cancer can be composed of many different subtypes. Identification of cancer subtypes is a key part of personalized cancer treatment and provides an important reference for clinical diagnosis and treatment. Some studies have shown that there are significant differences in the genetic and epigenetic profiles among different cancer subtypes during carcinogenesis and development. In this study, we first collect seven cancer datasets from the Broad Institute GDAC Firehose, including gene expression profile, isoform expression profile, DNA methylation expression data, and survival information correspondingly. Furthermore, we employ kernel principal component analysis (PCA) to extract features for each expression profile, convert them into three similarity kernel matrices by Gaussian kernel function, and then fuse these matrices as a global kernel matrix. Finally, we apply it to spectral clustering algorithm to get the clustering results of different cancer subtypes. In the experimental results, besides using the P-value from the Cox regression model and survival analysis as the primary evaluation measures, we also introduce statistical indicators such as Rand index (RI) and adjusted RI (ARI) to verify the performance of clustering. Then combining with gene expression profile, we obtain the differential expression of genes among different subtypes by gene set enrichment analysis. For lung cancer, GMPS, EPHA10, C10orf54, and MAGEA6 are highly expressed in different subtypes; for liver cancer, CMYA5, DEPDC6, FAU, VPS24, RCBTB2, LOC100133469, and SLC35B4 are significantly expressed in different subtypes.

Highlights

Cancer is the important leading cause of death in the world and is responsible for an estimated 9.6 million deaths in 2018
It covers a variety of omics expression data including genomics, transcriptomics, copy number variation, DNA methylation, proteomics, and clinical information of followup cases (Tomczak et al, 2015; Jiang et al, 2019a), which provide great support for the detection of cancer subtypes by computational methods
We propose a novel method for analyzing various cancer subtypes

Summary

INTRODUCTION

Cancer is the important leading cause of death in the world and is responsible for an estimated 9.6 million deaths in 2018. The Cancer Genome Atlas (TCGA) is the largest open cancer genome database to date initiated by the US government, which aims to catalog and discover major cancer-causing genome alterations in large cohorts of over 30 human tumors through large-scale genome sequencing and integrated multidimensional analyses It covers a variety of omics expression data including genomics, transcriptomics, copy number variation, DNA methylation, proteomics, and clinical information of followup cases (Tomczak et al, 2015; Jiang et al, 2019a), which provide great support for the detection of cancer subtypes by computational methods. The iCluster is a latent variable modelbased clustering algorithm proposed by Shen et al (2009) It uses multiple sources of data for integrated analysis to identify tumor subtypes. We apply it to spectral clustering algorithm to get the clustering results of different cancer subtypes

MATERIALS AND METHODS

RESULTS

Evaluation Novel Method

CONCLUSION

DATA AVAILABILITY STATEMENT

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Mar 4, 2021
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Abstract 5475: Evaluation of feature selection methods used for cluster analysis in identification of novel cancer subtypes
Linda Vidman ... David Källberg
Cancer Research | VOL. 80
Linda Vidman, et. al.Linda Vidman ... David Källberg
13 Aug 2020
Cancer Research | VOL. 80

Abstract 7566: Identification of cancer subtypes with a ctDNA-based targeted methylation assay
Tracy Nance ... Charles Swanton
Cancer Research | VOL. 84
Tracy Nance, et. al.Tracy Nance ... Charles Swanton
22 Mar 2024
Abstract 7566: Identification of cancer subtypes with a ctDNA-based targeted methylation assay
Tracy Nance ... Charles Swanton

Comparison of Methods for Feature Selection in Clustering of High-Dimensional RNA-Sequencing Data to Identify Cancer Subtypes.
David Källberg ... Linda Vidman
Frontiers in genetics | VOL. 12
David Källberg, et. al.David Källberg ... Linda Vidman
24 Feb 2021
Frontiers in genetics | VOL. 12

Dissecting super-enhancer heterogeneity: time to re-examine cancer subtypes?
Tan Wu ... Xin Wang
Trends in Genetics | VOL. 38
Tan Wu, et. al.Tan Wu ... Xin Wang
05 Jul 2022
Trends in Genetics | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics