Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA.

Laxmi Parida,Claudia Haferlach,Filippo Utro,Sven Twardziok,Chaya Levovitz,Stephan Hutter,Niroshan Nadarajah,Constance Baer,Torsten Haferlach,Manja Meggendorfer,Kahn Rhrissorrakrai,Wencke Walter,Wolfgang Kern

doi:10.1371/journal.pcbi.1007332

Laxmi Parida, Claudia Haferlach + Show 11 more

Open Access

https://doi.org/10.1371/journal.pcbi.1007332

Copy DOI

Abstract

The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same ‘cell of origin’. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL’s predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.

Highlights

Since the completion of the Human Genome Project, progress has been made in understanding the genome, in diseases of the genome such as cancer
Equipped with ultra-deep whole genome sequencing (WGS) capabilities that dig out ever more rare variants and current machine learning (ML) capabilities with the potential to process large amounts of data undeterred by noise at various scales, we focus here on blood cancer
We found the predictability (F1 score) of an array of ML and Artificial Intelligence (AI) methods on patient genomic data algorithms to be disappointingly poor when we considered multiple types of features including individual alleles, individual genes (S5 Table), and windows of mutations from different genomic regions (Fig 1B)

Summary

Introduction

Since the completion of the Human Genome Project, progress has been made in understanding the genome, in diseases of the genome such as cancer. Large gaps continue to exist in our knowledge of mutational (genomic) markers vis-à-vis subtle disease subtypes. The primary focus has been on coding genes; the assumed instigators of cancer. Whole exome sequencing’s (WES) intrinsic focus on coding DNA, called the exonic, has naturally reinforced the centrality of coding genes as “cancer drivers” by exclusively discovering coding alterations associated to disease etiology. Classical oncogenetics believe passenger mutations accompany driver mutations throughout the genome but are inconsequential to tumorigenesis [1]. This model has been redefined with the suggestion that these passenger mutations, whether in coding or noncoding DNA, might have a role in cancer progression [2]. The aggregate effect of multiple weak passenger mutations may have a strong influence on tumorigenesis

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Computational Biology	Publication Date: Aug 30, 2019
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Abstract 4259: Defining subtle cancer subtypes using the darkest DNA
Laxmi Parida ... Manja Meggendorfer
Cancer Research | VOL. 79
Laxmi Parida, et. al.Laxmi Parida ... Manja Meggendorfer
01 Jul 2019
Abstract 4259: Defining subtle cancer subtypes using the darkest DNA
Laxmi Parida ... Manja Meggendorfer

Abstract 4259: Defining subtle cancer subtypes using the darkest DNA
Laxmi Parida ... Kern Wolfgang
-
Laxmi Parida, et. al.Laxmi Parida ... Kern Wolfgang
01 Jul 2019
Abstract 4259: Defining subtle cancer subtypes using the darkest DNA
Laxmi Parida ... Kern Wolfgang

Extending Rare-Variant Testing Strategies: Analysis of Noncoding Sequence and Imputed Genotypes
Matthew Zawistowski ... Sebastian Zöllner
The American Journal of Human Genetics | VOL. 87
Matthew Zawistowski, et. al.Matthew Zawistowski ... Sebastian Zöllner
01 Nov 2010
The American Journal of Human Genetics | VOL. 87

A survey of rare coding variants in candidate genes in schizophrenia by deep sequencing
X Hu ... T A Lanz
Molecular Psychiatry | VOL. 19
X Hu, et. al.X Hu ... T A Lanz
15 Oct 2013
Molecular Psychiatry | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology