Abstract
Aim Haplotypes estimated using the Expectation–Maximization (EM) algorithm and haplotypes inferred as identical by descent (IBD) via pedigree were compared in European affected family-based controls (AFBACs) genotyped at two-field resolution for HLA-A, -B, -C, -DRB1, -DQA1, -DQB1, -DPA1 and -DPB1 by the Type 1 Diabetes Genetics Consortium (T1DGC). Methods The T1DGC dataset includes a total of 3252 European AFBACs. 1151 families had an AFBAC haplotype in both parents. Each pair of AFBACs in a family was combined to create a ‘synthetic genotype’, and PyPop was used to estimate pairwise haplotypes via EM, calculate linkage disequilibrium (LD) and evaluate adherence to Hardy–Weinberg expectations (HWE). Haplotype counts ( n ) and LD ( D ′) for EM and IBD haplotypes were compared, and LD was evaluated in IBD and EM haplotypes generated for 1000 resampled subsets of 100, 200, 400 and 800 AFBAC pairs. Results All loci met HWE. D′ was equivalent for IBD (mean D ′ = 0.45) and EM (mean D ′ = 0.46) haplotypes over all 1151 families. For IBD haplotypes, D ′ decreased with the number of AFBAC pairs tested, with mean D ′ of 0.62, 0.56, 0.51, and 0.47 for 100, 200, 400, and 800 pairs, respectively. For the 4000 EM haplotype replicates, mean D ′ was 0.62, 0.56, 0.51, and 0.47 for 100, 200, 400, and 800 pairs. Overall, 5365 haplotypes were inferred and 5123 estimated. HLA-A ∼ HLA-B haplotypes displayed the largest difference in number of unique EM and IBD haplotypes (362 EM, 392 IBD). Up to 65% of EM haplotypes with n n between 1 and 2 were not inferred. Up to 45% of IBD haplotypes with n = 1 and 33% of IBD haplotypes with n = 2 were not estimated. While the fraction of EM haplotypes with over- or underestimated values of n did not differ significantly, the mean overestimated n differed significantly from the mean underestimated n ( p = 2.24e-6). Conclusion LD varies inversely with sample size, and this trend is more pronounced for EM over IBD haplotypes. As a result, the likelihood is high that EM-based LD values for small populations are overestimated. Rare EM haplotypes are likely to be spurious, and rare IBD haplotypes are difficult to estimate. Such rare haplotypes should be considered with caution when making haplotype inferences. The EM algorithm results in greater overestimation than underestimation of haplotype counts.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have