A pan-cancer somatic mutation embedding using autoencoders

Martin Palazzo,Pierre Beauseroy,Patricio Yankilevich

doi:10.1186/s12859-019-3298-z

Abstract

BackgroundNext generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing.ResultsHere we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes.ConclusionsThe learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.

Highlights

Generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes
In this work a neural network maps tumors characterized by mutational profiles from a high dimensional space, built from somatic mutated genes, to a low dimensional space using an Autoencoder as a nonlinear function
The input tumor mutational profiles are transformed into a latent space as dense vectors

Summary

Introduction

Generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. Recent years have been characterized by the availability of data repositories providing access to large-scale collaborative cancer projects [1, 2]. These databases contain data from thousands of tumor samples from patients all over the world labeled by tumor type, subtype and other clinical factors such as age and prognosis. The available tumor data includes different layers of biological signals acquired by state-of-the-art omics technologies (e.g., genomics, transcriptomics, proteomics, metabolomics, etc). Each layer represents the signature of the tumor represented by different macromolecules. Another characteristic is that each omic layer

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Dec 1, 2019
Citations: 13	License type: open-access

R Discovery Prime

R Discovery Prime

A pan-cancer somatic mutation embedding using autoencoders

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Differential Allele-Specific Expression Uncovers Breast Cancer Genes Dysregulated by Cis Noncoding Mutations.
Pawel F Przytycki ... Mona Singh
Cell Systems | VOL. 10
Pawel F Przytycki, et. al.Pawel F Przytycki ... Mona Singh
01 Feb 2020
Cell Systems | VOL. 10

Abstract IA24: Molecular correlates of T cell-inflamed and non-T cell-inflamed tumors
Riyue Bao ... Jason J Luke
Clinical cancer research : an official journal of the American Association for Cancer Research | VOL. 26
Riyue Bao, et. al.Riyue Bao ... Jason J Luke
15 Jun 2020
Clinical cancer research : an official journal of the American Association for Cancer Research | VOL. 26

Germline BRCA2, ATM and CHEK2 alterations shape somatic mutation landscapes in prostate cancer.
Mari Nakazawa ... Emmanuel S Antonarakis
Journal of Clinical Oncology | VOL. 40
Mari Nakazawa, et. al.Mari Nakazawa ... Emmanuel S Antonarakis
20 Feb 2022
Journal of Clinical Oncology | VOL. 40

Abstract 626: Cross-site reproducibility and orthogonal validation of copy number and somatic mutation calls of OncoScan® FFPE Assay Kit in solid tumors
Joseph M Foster ... Assa Oumie
American Journal of Cancer | VOL. 75
Joseph M Foster, et. al.Joseph M Foster ... Assa Oumie
01 Aug 2015
American Journal of Cancer | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A pan-cancer somatic mutation embedding using autoencoders

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics