Abstract

Significant interest has emerged in mapping genetic susceptibility for complex traits through whole-genome association studies. These studies rely on the extent of association, i.e., linkage disequilibrium (LD), between single nucleotide polymorphisms (SNPs) across the human genome. LD describes the nonrandom association between SNP pairs and can be used as a metric when designing maximally informative panels of SNPs for association studies in human populations. Using data from the 1.58 million SNPs genotyped by Perlegen, we explored the allele frequency dependence of the LD statistic r 2 both empirically and theoretically. We show that average r 2 values between SNPs unmatched for allele frequency are always limited to much less than 1 (theoretical approximately 0.46 to 0.57 for this dataset). Frequency matching of SNP pairs provides a more sensitive measure for assessing the average decay of LD and generates average r 2 values across nearly the entire informative range (from 0 to 0.89 through 0.95). Additionally, we analyzed the extent of perfect LD (r 2 = 1.0) using frequency-matched SNPs and found significant differences in the extent of LD in genic regions versus intergenic regions. The SNP pairs exhibiting perfect LD showed a significant bias for derived, nonancestral alleles, providing evidence for positive natural selection in the human genome.

Highlights

  • The identification of more than 10 million single nucleotide polymorphisms (SNPs) in the National Center for Biotechnology Institute single nucleotide polymorphism database dbSNP provides an extensive database for human genetic analysis

  • Using the 1.58 million SNPs genotyped by Perlegen [1], we explored the impact of allele frequency in calculating genome-wide linkage disequilibrium (LD)

  • We explore the decay of LD as a function of physical distance and SNP allele frequency

Read more

Summary

Introduction

Other large sample sets exist with full genotyping data, such as the approximately 1.1 million SNP genotypes generated in Phase 1 of the International HapMap Project in 30 parent-child trios of European and 30 of African descent and 45 and 44 unrelated individuals of Chinese and Japanese descent, respectively [2]. These genome-wide SNP datasets provide a resource for analyzing genome-wide linkage disequilibrium (LD) structure when selecting SNPs for association studies [1,3,4,5,6]. The future of whole genome association studies will rely on LD extending over substantial physical distances to identify a causative marker (or genomic interval) even if it is not directly genotyped in a study [11,12] and to select maximally informative markers and decrease genotyping cost

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call