SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis.

Hung Nguyen,Monikrishna Roy,Sorin Draghici,Adam Cassell,Sergiu Dascalu,Bang Tran,Duc Tran,Tin Nguyen

doi:10.3389/fonc.2021.725133

Hung Nguyen, Monikrishna Roy + Show 6 more

Open Access

https://doi.org/10.3389/fonc.2021.725133

Copy DOI

Abstract

Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com. The R package will be deposited to CRAN as part of our PINSPlus software suite.

Highlights

Since cancer is a heterogeneous disease, the correct identification of cancer subtypes is essential for accurate prognosis and improved treatment
Users can normalize and concatenate multiple data types into one single matrix and apply well-known methods developed for single-omics analysis, such as ConsensusClusterPlus [5], to determine the subtypes
It first projects each data type onto a lower-dimensional space using randomized singular value decomposition (RSVD) and performs a perturbation clustering (PINS) [29, 30] to determine the subtypes within each data level

Summary

Introduction

Since cancer is a heterogeneous disease, the correct identification of cancer subtypes is essential for accurate prognosis and improved treatment. Vast amounts of molecular data have accumulated in public repositories, including The Cancer Genome Atlas datasets (TCGA) [1], Genomic Data Commons Data Portal (GDC) [2], Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) [3], and UK Biobank [4] This demands powerful yet fast analysis methods to leverage large multi-omics datasets for a more accurate subtype discovery. Users can normalize and concatenate multiple data types (e.g., mRNA, methylation, miRNA, etc.) into one single matrix and apply well-known methods developed for single-omics analysis, such as ConsensusClusterPlus [5], to determine the subtypes Such approaches are simple and computationally efficient. They do not account for data heterogeneity, e.g., different data types might have different scales, dimensions and might require different normalization procedures

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Oncology	Publication Date: Oct 20, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Oncology

Lead the way for us

Similar Papers

A Novel Method for Cancer Subtyping and Risk Prediction Using Consensus Factor Analysis.
Duc Tran ... Uyen Le
Frontiers in Oncology | VOL. 10
Duc Tran, et. al.Duc Tran ... Uyen Le
24 Jun 2020
Frontiers in Oncology | VOL. 10

The cost of hope: Doctors weigh the benefits of new drugs against sky-high costs
Samuel Loewenberg
Molecular Oncology | VOL. 4
Samuel LoewenbergSamuel Loewenberg
23 Mar 2010
Molecular Oncology | VOL. 4

A role for the transducer of the Hippo pathway, TAZ, in the development of aggressive types of endometrial cancer
Laura Romero-Pérez ... Jose Palacios
Modern Pathology | VOL. 28
Laura Romero-Pérez, et. al.Laura Romero-Pérez ... Jose Palacios
01 Nov 2015
Modern Pathology | VOL. 28

Human Cancer Classification and Prediction Based on Gene Profiling
H N Megha ... R H Goudar
-
H N Megha, et. al.H N Megha ... R H Goudar
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Oncology