QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species.

Jifeng Tang,Ben Vosman,C Gerard Van Der Linden,Jack Am Leunissen,Roeland E Voorrips

doi:10.1186/1471-2105-7-438

Jifeng Tang, Ben Vosman + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-7-438

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Oct 9, 2006
Citations: 157	License type: cc-by

Affiliation: Wageningen University & Research

Abstract

BackgroundSingle nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only.ResultsWe have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels) in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs) with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans.ConclusionQualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at and as Additional files.

Highlights

Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution
QualitySNP's versatility is demonstrated by the identification of single nucleotide polymorphisms (SNP) in EST datasets from potato, chicken and humans
The new pipeline for SNP detection presented here distinguishes itself from other programs mainly in the approach it takes for detecting sequencing errors and paralogous sequences

Summary

Introduction

Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. Sequence variation in the genomic DNA of individuals of the same species or related species are typically single nucleotide polymorphisms (SNP) or small insertions/ deletions (indels) [1,2]. Because of their abundance and slow mutation rate within the genome, they are the most common type of genetic markers [3] for studying complex genetic traits and genome evolution [4]. Both autoSNP and SNiPpER are based on sequence redundancy for the initial detection of SNPs, and sequencing errors are detected and filtered out by analyzing SNP patterns

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Efficient clustering of large EST data sets on parallel computers.
A Kalyanaraman
Nucleic Acids Research | VOL. 31
A KalyanaramanA Kalyanaraman
01 Jun 2003
Nucleic Acids Research | VOL. 31

Computational analysis of sugarcane ESTs for high-quality clusters and SSR mining
...
The Turkish Journal of Gastroenterology | VOL. 2
, et. al. ...
12 Jul 2012
The Turkish Journal of Gastroenterology | VOL. 2

EST data mining and applications in fungal genomics
Peijun Zhang ... Xiang Jia Min
Applied Mycology and Biotechnology | VOL. 5
Peijun Zhang, et. al.Peijun Zhang ... Xiang Jia Min
01 Jan 2004
Applied Mycology and Biotechnology | VOL. 5

ESTimating plant phylogeny: lessons from partitioning.
Jose Eb De La Torre ... Dennis W Stevenson
BMC Evolutionary Biology | VOL. 6
Jose Eb De La Torre, et. al.Jose Eb De La Torre ... Dennis W Stevenson
15 Jun 2006
BMC Evolutionary Biology | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics