Abstract

The major histocompatibility complex (MHC) is one of the most variable and gene-dense regions of the human genome. Most studies of the MHC, and associated regions, focus on minor variants and HLA typing, many of which have been demonstrated to be associated with human disease susceptibility and metabolic pathways. However, the detection of variants in the MHC region, and diagnostic HLA typing, still lacks a coherent, standardized, cost effective and high coverage protocol of clinical quality and reliability. In this paper, we presented such a method for the accurate detection of minor variants and HLA types in the human MHC region, using high-throughput, high-coverage sequencing of target regions. A probe set was designed to template upon the 8 annotated human MHC haplotypes, and to encompass the 5 megabases (Mb) of the extended MHC region. We deployed our probes upon three, genetically diverse human samples for probe set evaluation, and sequencing data show that ∼97% of the MHC region, and over 99% of the genes in MHC region, are covered with sufficient depth and good evenness. 98% of genotypes called by this capture sequencing prove consistent with established HapMap genotypes. We have concurrently developed a one-step pipeline for calling any HLA type referenced in the IMGT/HLA database from this target capture sequencing data, which shows over 96% typing accuracy when deployed at 4 digital resolution. This cost-effective and highly accurate approach for variant detection and HLA typing in the MHC region may lend further insight into immune-mediated diseases studies, and may find clinical utility in transplantation medicine research. This one-step pipeline is released for general evaluation and use by the scientific community.

Highlights

  • The major histocompatibility complex (MHC) region, one of the most gene-dense regions of the human genome, is located on the short arm of human chromosome 6

  • After filtering reads with low sequence quality or sequencing adaptor, the purged data are mapped to the human genome reference sequence hg19, and more than 60% of the mapped reads are proved to align to the MHC region (Table 1)

  • We investigated the 3–4% of uncovered regions and found that more than 99% of the uncovered bases were located in a long repeating region with length .2000 base per repeat, which was difficult to solve with capture method due to next generation sequencing (NGS) reads typically being too short to span the breadth of the long repeat region

Read more

Summary

Introduction

The MHC region, one of the most gene-dense regions of the human genome, is located on the short arm of human chromosome 6. In addition to its high gene density, the MHC region is one of the most complex regions in the human genome, due to the extremely high density of polymorphism and linkage disequilibrium (LD). This inherent complexity has made identification of the underlying, causative variants contributing to disease phenotypes by genome-wide association study (GWAS) a challenge. Recognizing the importance of fully informative polymorphism and haplotype maps of the MHC region, as pertaining to MHC-related-diseases, the MHC Haplotype Consortium has conducted the MHC Haplotype Project between 2000 and 2006, and provided the sequence and annotations of eight different HLA-homozygous-typing haplotypes (PGF, COX, QBL, APD, DBB, MANN, MCF and SSTO) [6]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call