Abstract

Large registries of volunteer hematopoietic stem cell donors typed for HLA contain potentially valuable data for studying haplotype frequencies in the general population. However the usual assumptions for use of the expectation-maximization (EM) algorithm are typically violated in these registries. To avoid this problem, previous studies using registry data have reduced the HLA typings to low-resolution and/or excluded subjects who were selected for testing on behalf of a specific patient ("patient-directed" typings). These restrictions, added to avoid bias from selection of nonrepresentative volunteers for higher-resolution typing, have limited previous results to haplotypes defined at low resolution. In this article we eliminate the need for such restrictions by formally relaxing the assumptions necessary for the EM algorithm. We show mathematically and through simulation that varying levels of resolution can be incorporated even if the level of typing resolution is chosen based on the HLA type. This allows use of intermediate and high resolution data from patient-directed typings to extend haplotype frequency estimates to the allele level for HLA-DRB1. We demonstrate the feasibility of using this computationally demanding algorithm on large datasets by applying it to more than 3 million volunteers listed in the National Marrow Donor Program Registry.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call