Abstract

An important step of somatic variant calling algorithms for deep sequencing data is quantifying the errors. For targeted sequencing in which hotspot mutations are of interest, site-specific error estimation allows more accurate calling. The site-specific error rates are often estimated from a panel of normal samples, which has limited size and is subject to sampling bias and variance. We propose a novel statistical validation method for single-nucleotide variation (SNV) calling based on historical data. The validation method extracts the high-quality reads from the Binary Alignment/Map (BAM) files, finds the negative samples in the data, and builds a statistical model to call individual samples. It is particularly useful in detecting low-frequency variants that may be missed by traditional panel of normal-based SNV methods. The proposed method makes it possible to launch a simple and parallel validation pipeline for SNV calling and improve the detection limit.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call