Bivartect: accurate and memory-saving breakpoint detection by direct read comparison.

Keisuke Shimmura,Yuki Kato,Yukio Kawahara,Alfonso Valencia

doi:10.1093/bioinformatics/btaa059

Keisuke Shimmura, Yuki Kato + Show 2 more

Open Access

https://doi.org/10.1093/bioinformatics/btaa059

Copy DOI

Journal: Bioinformatics (Oxford, England)	Publication Date: Jan 27, 2020
Citations: 4	License type: CC BY 4.0

Affiliation: Osaka University

Abstract

MotivationGenetic variant calling with high-throughput sequencing data has been recognized as a useful tool for better understanding of disease mechanism and detection of potential off-target sites in genome editing. Since most of the variant calling algorithms rely on initial mapping onto a reference genome and tend to predict many variant candidates, variant calling remains challenging in terms of predicting variants with low false positives.ResultsHere we present Bivartect, a simple yet versatile variant caller based on direct comparison of short sequence reads between normal and mutated samples. Bivartect can detect not only single nucleotide variants but also insertions/deletions, inversions and their complexes. Bivartect achieves high predictive performance with an elaborate memory-saving mechanism, which allows Bivartect to run on a computer with a single node for analyzing small omics data. Tests with simulated benchmark and real genome-editing data indicate that Bivartect was comparable to state-of-the-art variant callers in positive predictive value for detection of single nucleotide variants, even though it yielded a substantially small number of candidates. These results suggest that Bivartect, a reference-free approach, will contribute to the identification of germline mutations as well as off-target sites introduced during genome editing with high accuracy.Availability and implementationBivartect is implemented in C++ and available along with in silico simulated data at https://github.com/ykat0/bivartect.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Genomic structural variations have been widely investigated at base pair resolution using the prevailing high-throughput sequencing technologies (Alkan et al, 2011)
Predictive performance was evaluated by calculating sensitivity, positive predictive value (PPV) and F-measure, defined in Supplementary Notes
The results indicated that Bivartect achieved the high range of PPV (0.986) for single nucleotide variants (SNVs) detection, and the third best balanced accuracy (F-measure, 0.904) for indels after MuTect2 and Strelka2 (Fig. 2a and b and Supplementary Table S3)

Summary

Introduction

Genomic structural variations have been widely investigated at base pair resolution using the prevailing high-throughput sequencing technologies (Alkan et al, 2011). Examples where genomic variations occur include germline/somatic mutations, ranging from single nucleotide variants (SNVs) to structural variants (SVs) of at least 50 bp (Sudmant et al, 2015; The 1000 Genomes Project Consortium, 2015) such as insertions/deletions (indels), inversions or translocations (Weischenfeldt et al, 2013) Given that these variants can be associated with complex diseases and somatic mutations may correlate with the progression of cancer, detection of these variants is essential to elucidate disease mechanisms (Martincorena and Campbell, 2015). Most of the in silico methods for variant detection rely on initial mapping onto a reference genome (Chen et al, 2009, 2016; Cibulskis et al, 2013; DePristo et al, 2011; Kim et al, 2018; Lai et al, 2016; Larson et al, 2012; Rausch et al, 2012; Wang et al., 2011; Ye et al, 2009) This implies that the quality of variant calling could be exacerbated when variant-containing reads are to be of no consideration due to being unmapped onto the reference genome. These approaches typically predict many variant candidates and require some complex filtering steps based on statistical methods after the initial mapping in order to remove false-positive predictions, which would in turn result in lower sensitivity

Methods

Results

Conclusion