Abstract 388: A benchmark study for identifying cancer drivers in the non-coding part of the genome

Damien Drubay,Stefan Michiels,Daniel Gautheret

doi:10.1158/1538-7445.am2017-388

Abstract

Abstract Purpose: Prioritizing potential deleterious variants is an essential task to guide research and validation of new pathological variants in the immensity of the genome. Many tools have been introduced to detect new variants in the coding part of the genome. Detailed knowledge of coding sequences led to efficient statistical models for cancer driver discovery. The challenge is greater for the non-coding part of the genome due to its large size (&gt;98% of the genome) which contains many non-functional or unknown features. Several deleteriousness scores have been proposed in the last decade, but no large-scale comparison has been realized to date to assess their ability to identify cancer drivers. Material and method: We compared the leading scoring systems (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathologic variants in the non-coding genome (as identified by 928 ClinVar variants / 44,158 recurrent COSMIC mutations) from assumed non-pathologic variants (100,000 randomly sampled 1000 Genomes project variants with minor allele frequency &gt; 1%). To define the pathogenic variants using COSMIC as reference, we varied the threshold for number of COSMIC recurrences from 2 to 10. We compared the sensibility, specificity and precision of the scoring systems using the area under the curve (AUC) of receiver operating characteristic (ROC) and precision-recall (PR) curves. Results: Most scores had good sensibility and specificity for the detection of the ClinVar variants (AUCROC&gt;0.90). As far as precision for ClinVar variants was concerned, the top performing methods were CADD (AUCPR=0.84), DANN (AUCPR=0.83) and, to a lesser extent, FATHMM-MKL (AUCPR=0.75). When using a threshold of 3 recurrences to define true pathogenicity of COSMIC variants, the AUCROC ranged from 0.52 (DANN) to 0.80 (GWAVA) but precision was low with AUCPR ranging from 0.05 (DANN, SOMmelanoma) to 0.18 (GWAVA). Increasing the pathogenicity threshold to 10 recurrences increased AUCROC values (ranging from 0.50 (SOMmelanoma) to 0.89 (GWAVA)) but decreased precision values (AUCPR ranging from 0 to 0.02). Discussion: This large scale benchmark study distinguished CADD as the best tool to detect variants with features similar to those of ClinVar, which are mainly located in protein coding regions. However, based on the results using COSMIC, GWAVA outperformed CADD for variants in other regions, including lincRNAs, pseudogenes and other parts of the genome “dark matter”, for which there is increased interest. This should nevertheless be balanced by the potential presence of non-pathologic variants in the COSMIC database due to sequencing errors and limitation of the recurrence criteria to define pathologic status in the instable fragile genome regions. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking. Citation Format: Damien Drubay, Daniel Gautheret, Stefan Michiels. A benchmark study for identifying cancer drivers in the non-coding part of the genome [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 388. doi:10.1158/1538-7445.AM2017-388

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract 388: A benchmark study for identifying cancer drivers in the non-coding part of the genome

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Journal: Cancer Research	Publication Date: Jul 1, 2017
Citations: 1

Similar Papers

A benchmark study of scoring methods for non-coding mutations.
Damien Drubay ... Stefan Michiels
Bioinformatics | VOL. 34
Damien Drubay, et. al.Damien Drubay ... Stefan Michiels
11 Jan 2018
Bioinformatics | VOL. 34

Neutrophil Related Data Obtained From Newly Developed Automatic Hematology Analyzer Sysmex XN-2000 Can Provide Useful Informations For The Discrimination Of Sepsis Patients
Sang Hyuk Park ... Hyun-Sook Chi
Blood | VOL. 122
Sang Hyuk Park, et. al.Sang Hyuk Park ... Hyun-Sook Chi
15 Nov 2013
Blood | VOL. 122

Simulation study of a novel method for comparing more than two independent receiver operating characteristic (ROC) curves based on the area under the curves (AUCs)
An Meyen ... Mr Sooriyarachchi
Journal of the National Science Foundation of Sri Lanka | VOL. 43
An Meyen, et. al.An Meyen ... Mr Sooriyarachchi
26 Dec 2015
Journal of the National Science Foundation of Sri Lanka | VOL. 43

αIIbβ3 Variants Defined By Next Generation Sequencing: Implications for Predicting Variants Likely to Cause Glanzmann Thrombasthenia and Alloimmune Disorders
Claudia Lorena Buitrago ... Barry S Coller
Blood | VOL. 124
Claudia Lorena Buitrago, et. al.Claudia Lorena Buitrago ... Barry S Coller
06 Dec 2014
Blood | VOL. 124

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract 388: A benchmark study for identifying cancer drivers in the non-coding part of the genome

Abstract

Talk to us

Similar Papers

More From: Cancer Research