Defining haplotype blocks and tag single-nucleotide polymorphisms in the human genome.

Thomas G Schulze,Nirmala Akula,Yu-Sheng Chen,Francis J Mcmahon,Fengzhu Sun,Kui Zhang

doi:10.1093/hmg/ddh035

Abstract

Recent studies suggest that the genome is organized into blocks of haplotypes, and efforts to create a genome-wide haplotype map of single-nucleotide polymorphisms (SNPs) are already underway. Haplotype blocks are defined algorithmically and to date several algorithms have been proposed. However, little is known about their relative performance in real data or about the impact of allele frequencies and parameter choices on the detection of haplotype blocks and the markers that tag them. Here we present a formal comparison of two major algorithms, a linkage disequilibrium (LD)-based method and a dynamic programming algorithm (DPA), in three chromosomal regions differing in gene content and recombination rate. The two methods produced strikingly different results. DPA identified fewer and larger haplotype blocks as well as a smaller set of tag SNPs than the LD method. For both methods, the results were strongly dependent on the allele frequency. Decreasing the minor allele frequency led to an up to 3.7-fold increase in the number of haplotype blocks and tag SNPs. Definition of haploytpe blocks and tag SNPs was also sensitive to parameter changes, but the results could not be reconciled simply by parameter adjustment. These results show that two major methods for detecting haplotype blocks and tag SNPs can produce different results in the same data and that these results are sensitive to marker allele frequencies and parameter choices. More information is needed to guide the choice of method, marker allele frequencies, and parameters in the development of a haplotype map.

Full Text