Abstract

BackgroundCurrent popular variant calling pipelines rely on the mapping coordinates of each input read to a reference genome in order to detect variants. Since reads deriving from variant loci that diverge in sequence substantially from the reference are often assigned incorrect mapping coordinates, variant calling pipelines that rely on mapping coordinates can exhibit reduced sensitivity.ResultsIn this work we present GeDi, a suffix array-based somatic single nucleotide variant (SNV) calling algorithm that does not rely on read mapping coordinates to detect SNVs and is therefore capable of reference-free and mapping-free SNV detection. GeDi executes with practical runtime and memory resource requirements, is capable of SNV detection at very low allele frequency (<1%), and detects SNVs with high sensitivity at complex variant loci, dramatically outperforming MuTect, a well-established pipeline.ConclusionBy designing novel suffix-array based SNV calling methods, we have developed a practical SNV calling software, GeDi, that can characterise SNVs at complex variant loci and at low allele frequency thus increasing the repertoire of detectable SNVs in tumour genomes. We expect GeDi to find use cases in targeted-deep sequencing analysis, and to serve as a replacement and improvement over previous suffix-array based SNV calling methods.

Highlights

  • Current popular variant calling pipelines rely on the mapping coordinates of each input read to a reference genome in order to detect variants

  • Generalised suffix array-based Direct single nucleotide variant (SNV) caller (GeDi) can detect SNVs at allelic frequencies of

  • Despite the advance of reference-based SNV callers, these algorithms are prone to reduced sensitivity at complex variant loci where incorrect read mapping is common

Read more

Summary

Introduction

Current popular variant calling pipelines rely on the mapping coordinates of each input read to a reference genome in order to detect variants. Results: In this work we present GeDi, a suffix array-based somatic single nucleotide variant (SNV) calling algorithm that does not rely on read mapping coordinates to detect SNVs and is capable of reference-free and mapping-free SNV detection. To detect SNVs in paired tumour-control NGS datasets, SNV calling pipelines must compare reads of the tumour dataset against reads of the control dataset that derive from the same genomic location. This requires organising the input data by genomic location. Current popular somatic SNV calling pipelines organise the input data by mapping tumour and control reads to a human reference genome prior to SNV detection.

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call