MICADo - Looking for Mutations in Targeted PacBio Cancer Data: An Alignment-Free Method.

Justine Rudewicz,Hayssam Soueidan,Richard Iggo,Jonas Bergh,Macha Nikolski,Raluca Uricaru,Hervé Bonnefoi

doi:10.3389/fgene.2016.00214

Abstract

Targeted sequencing is commonly used in clinical application of NGS technology since it enables generation of sufficient sequencing depth in the targeted genes of interest and thus ensures the best possible downstream analysis. This notwithstanding, the accurate discovery and annotation of disease causing mutations remains a challenging problem even in such favorable context. The difficulty is particularly salient in the case of third generation sequencing technology, such as PacBio. We present MICADo, a de Bruijn graph based method, implemented in python, that makes possible to distinguish between patient specific mutations and other alterations for targeted sequencing of a cohort of patients. MICADo analyses NGS reads for each sample within the context of the data of the whole cohort in order to capture the differences between specificities of the sample with respect to the cohort. MICADo is particularly suitable for sequencing data from highly heterogeneous samples, especially when it involves high rates of non-uniform sequencing errors. It was validated on PacBio sequencing datasets from several cohorts of patients. The comparison with two widely used available tools, namely VarScan and GATK, shows that MICADo is more accurate, especially when true mutations have frequencies close to backgound noise. The source code is available at http://github.com/cbib/MICADo.

Highlights

Capturing known cancer genes by generation sequencing, approach known as “gene panel” or targeted sequencing, is commonly used for tumor genotyping
MICADo was evaluated on Pacific Biosciences (PacBio) sequencing datasets: (i) a novel sequencing of TP53 of a breast cancer cohort, (ii) a publicly available dataset of FLT3 sequencing of an acute myeloid leukemia cohort, and (iii) a synthetic dataset
Three pipelines based on GATK, VarScan, and MICADo were evaluated on both synthetic and real data

Summary

INTRODUCTION

Capturing known cancer genes by generation sequencing, approach known as “gene panel” or targeted sequencing, is commonly used for tumor genotyping. Despite the existence of these numerous computational solutions, calling somatic mutations in cancer data remains challenging due to a number of factors like technical artifacts, sequencing errors, biases of alignment algorithms, DNA contamination (control samples contaminated with tumor DNA), and tumor heterogeneity. This issue is even more salient for the third generation sequencing data, such as PacBio. since very high read depths are required for achieving sequence accuracy close to that of Illumina and Ion Torrent (Quail et al, 2012), variant calling potentially suffers from high false positive and negative rates. MICADo was evaluated on PacBio sequencing datasets: (i) a novel sequencing of TP53 of a breast cancer cohort, (ii) a publicly available dataset of FLT3 sequencing of an acute myeloid leukemia cohort, and (iii) a synthetic dataset

MICADo Approach

Datasets

Construction of de Bruijn Graphs

Alternative Path Search

Alternative Path Specificity and Variant Calling

RESULTS

Evaluation on Synthetic Data

TP53 Targeted Data

FLT3 Targeted Data

DISCUSSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in genetics	Publication Date: Dec 8, 2016
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

MICADo - Looking for Mutations in Targeted PacBio Cancer Data: An Alignment-Free Method.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics

Lead the way for us

Similar Papers

Estimating and accounting for genotyping errors in RAD‐seq experiments
Luisa Bresadola ... Vivian Link
Molecular Ecology Resources | VOL. 20
Luisa Bresadola, et. al.Luisa Bresadola ... Vivian Link
06 Apr 2020
Molecular Ecology Resources | VOL. 20

Unraveling the pathology of different disease severities in human cerebral organoid models of LIS1-lissencephaly

-

21 Feb 2021
21 Feb 2021

Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data.
...
NAR Genomics and Bioinformatics | VOL. 2
, et. al. ...
20 Apr 2020
NAR Genomics and Bioinformatics | VOL. 2

CUSHAW Suite: Parallel and Efficient Algorithms for NGS Read Alignment
Yongchao Liu ... Bertil Schmidt
-
Yongchao Liu, et. al.Yongchao Liu ... Bertil Schmidt
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MICADo - Looking for Mutations in Targeted PacBio Cancer Data: An Alignment-Free Method.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics