Abstract
The Haplotype Map (HapMap) project recently generated genotype data for more than 1 million single-nucleotide polymorphisms (SNPs) in four population samples. The main application of the data is in the selection of tag single-nucleotide polymorphisms (tSNPs) to use in association studies. The usefulness of this selection process needs to be verified in populations outside those used for the HapMap project. In addition, it is not known how well the data represent the general population, as only 90–120 chromosomes were used for each population and since the genotyped SNPs were selected so as to have high frequencies. In this study, we analyzed more than 1,000 individuals from Estonia. The population of this northern European country has been influenced by many different waves of migrations from Europe and Russia. We genotyped 1,536 randomly selected SNPs from two 500-kbp ENCODE regions on Chromosome 2. We observed that the tSNPs selected from the CEPH (Centre d'Etude du Polymorphisme Humain) from Utah (CEU) HapMap samples (derived from US residents with northern and western European ancestry) captured most of the variation in the Estonia sample. (Between 90% and 95% of the SNPs with a minor allele frequency of more than 5% have an r 2 of at least 0.8 with one of the CEU tSNPs.) Using the reverse approach, tags selected from the Estonia sample could almost equally well describe the CEU sample. Finally, we observed that the sample size, the allelic frequency, and the SNP density in the dataset used to select the tags each have important effects on the tagging performance. Overall, our study supports the use of HapMap data in other Caucasian populations, but the SNP density and the bias towards high-frequency SNPs have to be taken into account when designing association studies.
Highlights
The main objective of the Haplotype Map (HapMap) project is to provide the research community with a description of the linkage disequilibrium (LD) structure of the human genome in order to enable the optimization of the single-nucleotide polymorphism (SNP) selection process for association studies [1]
These two regions have previously been resequenced in their entirety in 48 individuals and all SNPs genotyped as part of the HapMap project
We used a population sample with mixed European ancestry to evaluate the usefulness of the HapMap project
Summary
The main objective of the HapMap project is to provide the research community with a description of the linkage disequilibrium (LD) structure of the human genome in order to enable the optimization of the single-nucleotide polymorphism (SNP) selection process for association studies [1]. Many algorithms have been described that can minimize the number of SNPs required (i.e., tag single-nucleotide polymorphisms [tSNPs]) to adequately represent the genetic variation across a specific region (or the entire genome). Algorithms were mostly based on the concept of common haplotypes within haplotype blocks (haplotype tag single-nucleotide polymorphisms [htSNPs]) [2,3,4,5,6,7]. There are many limitations to these algorithms as the haplotype block boundaries vary with the SNP density used and between sample sets, making it difficult to compare and adapt the approach to different populations [8,9]. The most commonly used tagging algorithms employ only the LD properties of the SNPs (or haplotypes), such as r2, which is entirely independent of the haplotype block concept [10,11,12]. Since r2 with the disease variant is inversely proportional to the increase in sample size required to achieve comparable power to detect it, the use of those algorithms facilitates the study design
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.