Sniper: improved SNP discovery by multiply mapping deep sequenced reads

Daniel F Simola,Junhyong Kim

doi:10.1186/gb-2011-12-6-r55

Daniel F Simola, Junhyong Kim

Open Access

https://doi.org/10.1186/gb-2011-12-6-r55

Copy DOI

Journal: Genome Biology	Publication Date: Jan 1, 2011
Citations: 50	License type: CC BY 2.0

Affiliation: University of Pennsylvania

Abstract

SNP (single nucleotide polymorphism) discovery using next-generation sequencing data remains difficult primarily because of redundant genomic regions, such as interspersed repetitive elements and paralogous genes, present in all eukaryotic genomes. To address this problem, we developed Sniper, a novel multi-locus Bayesian probabilistic model and a computationally efficient algorithm that explicitly incorporates sequence reads that map to multiple genomic loci. Our model fully accounts for sequencing error, template bias, and multi-locus SNP combinations, maintaining high sensitivity and specificity under a broad range of conditions. An implementation of Sniper is freely available at http://kim.bio.upenn.edu/software/sniper.shtml.

Highlights

The advent of next-generation, short-read sequencing (NGS) technologies has enabled large-scale, whole-genome resequencing studies that aim to discover novel single nucleotide polymorphism (SNP) and other population genetic variations
Previous genome resequencing efforts have developed a variety of approaches to identify SNPs, including straightforward decision rules such as minimum coverage and quality cutoffs along with filters that mask reads aligning to repetitive genomic templates [2]; Bayesian algorithms that explicitly model sequencing chemistry and take full advantage of read-specific quality scores [3,4]; unsupervised [5] and supervised [6,7] machinelearning algorithms trained to distinguish sequencing errors from SNPs; and an alignment method that performs read mapping using all four nucleotide probabilities per-locus instead of the most probable call [8]
A SNP occurring within a repetitive sequence may be identified from overlapping reads that are anchored by unique flanking template, accurate mapping may be impossible if the length of the repetitive sequence is greater than the length of the read

Summary

Introduction

The advent of next-generation, short-read sequencing (NGS) technologies has enabled large-scale, whole-genome resequencing studies that aim to discover novel SNPs and other population genetic variations. Previous genome resequencing efforts have developed a variety of approaches to identify SNPs, including straightforward decision rules such as minimum coverage and quality cutoffs along with filters that mask reads aligning to repetitive genomic templates [2]; Bayesian algorithms that explicitly model sequencing chemistry and take full advantage of read-specific quality scores [3,4]; unsupervised [5] and supervised [6,7] machinelearning algorithms trained to distinguish sequencing errors from SNPs; and an alignment method that performs read mapping using all four nucleotide probabilities per-locus instead of the most probable call [8] These tools have successfully predicted many novel SNPs, genomes themselves contain inherent degeneracy due to redundant paralogous sequences and low complexity repetitive elements, while NGS data exhibit non-negligible sequencing errors and severe. SNPs occurring in redundant sequence contexts may be missed

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sniper: improved SNP discovery by multiply mapping deep sequenced reads

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

New tools for single nucleotide polymorphism (SNP) discovery and analysis accelerating plant biotechnology
Robert Henry ... Keith Edwards
Plant Biotechnology Journal | VOL. 7
Robert Henry, et. al.Robert Henry ... Keith Edwards
13 Apr 2009
Plant Biotechnology Journal | VOL. 7

Performance of Microarray and Liquid Based Capture Methods for Target Enrichment for Massively Parallel Sequencing and SNP Discovery
Anna Kiialainen ... Snaevar Sigurdsson
PLoS ONE | VOL. 6
Anna Kiialainen, et. al.Anna Kiialainen ... Snaevar Sigurdsson
09 Feb 2011
PLoS ONE | VOL. 6

Single nucleotide polymorphism (SNP) discovery in duplicated genomes: intron-primed exon-crossing (IPEC) as a strategy for avoiding amplification of duplicated loci in Atlantic salmon (Salmo salar) and other salmonid fishes
Heikki J Ryynänen ... Craig R Primmer
BMC Genomics | VOL. 7
Heikki J Ryynänen, et. al.Heikki J Ryynänen ... Craig R Primmer
27 Jul 2006
BMC Genomics | VOL. 7

SNP Discovery in the Transcriptome of White Pacific Shrimp Litopenaeus vannamei by Next Generation Sequencing
Yang Yu ... Jiankai Wei
PLoS ONE | VOL. 9
Yang Yu, et. al.Yang Yu ... Jiankai Wei
30 Jan 2014
PLoS ONE | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sniper: improved SNP discovery by multiply mapping deep sequenced reads

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology