Snagger: A user-friendly program for incorporating additional information for tagSNP selection

Christopher K Edlund,David V Conti,Won H Lee,Dalin Li,David J Van Den Berg

doi:10.1186/1471-2105-9-174

Christopher K Edlund, David V Conti + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-9-174

Copy DOI

Journal: BMC bioinformatics	Publication Date: Mar 27, 2008
Citations: 57	License type: cc-by

Affiliation: University of Southern California

Abstract

BackgroundThere has been considerable effort focused on developing efficient programs for tagging single-nucleotide polymorphisms (SNPs). Many of these programs do not account for potential reduced genomic coverage resulting from genotyping failures nor do they preferentially select SNPs based on functionality, which may be more likely to be biologically important.ResultsWe have developed a user-friendly and efficient software program, Snagger, as an extension to the existing open-source software, Haploview, which uses pairwise r2 linkage disequilibrium between single nucleotide polymorphisms (SNPs) to select tagSNPs. Snagger distinguishes itself from existing SNP selection algorithms, including Tagger, by providing user options that allow for: (1) prioritization of tagSNPs based on certain characteristics, including platform-specific design scores, functionality (i.e., coding status), and chromosomal position, (2) efficient selection of SNPs across multiple populations, (3) selection of tagSNPs outside defined genomic regions to improve coverage and genotyping success, and (4) picking of surrogate tagSNPs that serve as backups for tagSNPs whose failure would result in a significant loss of data. Using HapMap genotype data from ten ENCODE regions and design scores for the Illumina platform, we show similar coverage and design score distribution and fewer total tagSNPs selected by Snagger compared to the web server Tagger.ConclusionSnagger improves upon current available tagSNP software packages by providing a means for researchers to select tagSNPs that reliably capture genetic variation across multiple populations while accounting for significant genotyping failure risk and prioritizing on SNP-specific characteristics.

Highlights

There has been considerable effort focused on developing efficient programs for tagging single-nucleotide polymorphisms (SNPs)
BMC Bioinformatics 2008, 9:174 http://www.biomedcentral.com/1471-2105/9/174 these methods includes a preliminary stage of genotyping in which linkage disequilibrium (LD) or haplotype block structure is estimated by genotyping a set of evenly distributed single nucleotide polymorphisms (SNPs) across one or more genes for a sample set representative of a given population
TagSNPs chosen by Snagger had comparable, if not higher design scores than those selected by Tagger (Table 1b.)

Summary

Introduction

There has been considerable effort focused on developing efficient programs for tagging single-nucleotide polymorphisms (SNPs). Many of these programs do not account for potential reduced genomic coverage resulting from genotyping failures nor do they preferentially select SNPs based on functionality, which may be more likely to be biologically important. There has been extensive effort to develop and implement strategies for efficient selection of single nucleotide polymorphisms (SNPs) in candidate-gene association studies of complex disease. Due to the prohibitively high cost associated with genotyping every SNP within a given set of genes, methods have been developed to find a subset of these SNPs that capture the same genetic diversity. Subsequent haplotype tagging SNPs were genotyped in a larger case-control second-stage sample examining the association with breast cancer [3]

Results

Discussion

Conclusion