Abstract

BackgroundHuman genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necessary to genotype all SNPs for association study. Many algorithms have been developed to find a small subset of SNPs called tag SNPs that are sufficient to infer all the other SNPs. Algorithms based on the r2 LD statistic have gained popularity because r2 is directly related to statistical power to detect disease associations. Most of existing r2 based algorithms use pairwise LD. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. However, existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memory-consuming. They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.ResultsWe propose an efficient algorithm called FastTagger to calculate multi-marker tagging rules and select tag SNPs based on multi-marker LD. FastTagger uses several techniques to reduce running time and memory consumption. Our experiment results show that FastTagger is several times faster than existing multi-marker based tag SNP selection algorithms, and it consumes much less memory at the same time. As a result, FastTagger can work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.FastTagger also produces smaller sets of tag SNPs than existing multi-marker based algorithms, and the reduction ratio ranges from 3%-9% when length-3 tagging rules are used. The generated tagging rules can also be used for genotype imputation. We studied the prediction accuracy of individual rules, and the average accuracy is above 96% when r2 ≥ 0.9.ConclusionsGenerating multi-marker tagging rules is a computation intensive task, and it is the bottleneck of existing multi-marker based tag SNP selection methods. FastTagger is a practical and scalable algorithm to solve this problem.

Highlights

  • Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these single-nucleotide polymorphism (SNP) play an important role in understanding the association between genetic variations and human diseases

  • Among the genome-wide approaches, those based on the r2 linkage disequilibrium statistic have gained increasing popularity recently because r2 is directly related to statistical power to detect disease associations [14]

  • FastTagger can work on chromosomes containing more than 100 k SNPs with as less as 50 MB memory, while existing algorithm consumes more than 1 GB memory even on chromosomes containing around 30 k SNPs

Read more

Summary

Introduction

Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memoryconsuming. Existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memoryconsuming They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules. The genomewide tag SNP selection algorithms do not need to partition the whole chromosome into blocks, and they utilize linkage disequilibrium among nearby SNPs to find tag SNPs. Among the genome-wide approaches, those based on the r2 linkage disequilibrium statistic have gained increasing popularity recently because r2 is directly related to statistical power to detect disease associations [14]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.