Machine learning analysis of TCGA cancer data.

Jose Liñares-Blanco,Alejandro Pazos,Carlos Fernandez-Lozano

doi:10.7717/peerj-cs.584

Jose Liñares-Blanco, Alejandro Pazos + Show 1 more

Open Access

https://doi.org/10.7717/peerj-cs.584

Copy DOI

Journal: PeerJ Computer Science	Publication Date: Jul 12, 2021
Citations: 20	License type: CC BY 4.0

Affiliation: University of A Coruña

Abstract

In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.

Highlights

The appearance of the carcinogenic phenotype is the consequence of an alteration of one or more genes
We reviewed more than 100 papers that have used machine learning (ML) approaches with The Cancer Genome Atlas (TCGA) data
Many studies on cancer have been performed in recent years with ML that uses molecular data

Summary

INTRODUCTION

The appearance of the carcinogenic phenotype is the consequence of an alteration of one or more genes. These ML methods work really well with very large datasets, even when the number of variables in each observation is much greater than the total number of observations (n

SURVEY METHODOLOGY

Findings

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine learning analysis of TCGA cancer data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ Computer Science

Lead the way for us

Similar Papers

TCGA-assembler: open-source software for retrieving and processing TCGA data.
Yitan Zhu ... Yuan Ji
Nature Methods | VOL. 11
Yitan Zhu, et. al.Yitan Zhu ... Yuan Ji
29 May 2014
Nature Methods | VOL. 11

Abstract 2481: Developing analysis platform for pan-cancer study of DNA methylation, mirna and lncrna expression based on tumor subtypes using TCGA data
Darshan Shimoga Chandrashekar ... Sooryanarayana Varambally
Cancer Research | VOL. 79
Darshan Shimoga Chandrashekar, et. al.Darshan Shimoga Chandrashekar ... Sooryanarayana Varambally
01 Jul 2019
Cancer Research | VOL. 79

Abstract 2893: Integrated genomic meta-analysis of colorectal cancer by elastic-net.
Hojoon Lee ... Patrick Flaherty
Cancer Research | VOL. 73
Hojoon Lee, et. al.Hojoon Lee ... Patrick Flaherty
15 Apr 2013
Abstract 2893: Integrated genomic meta-analysis of colorectal cancer by elastic-net.
Hojoon Lee ... Patrick Flaherty

Abstract LB-287: Racial disparity in the TP53 mutation spectra in triple-negative breast cancers: Validation with TCGA data
Balananda-Dhurjati Kumar Putcha* ... Isam-Eldin Eltoum
Cancer Research | VOL. 74
Balananda-Dhurjati Kumar Putcha*, et. al.Balananda-Dhurjati Kumar Putcha* ... Isam-Eldin Eltoum
30 Sep 2014
Cancer Research | VOL. 74

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning analysis of TCGA cancer data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ Computer Science