Human leukocyte antigen (HLA) genes in the major histocompatibility complex (MHC) region are crucial for immunity and are associated with numerous diseases and phenotypes. The MHC region's complexity and high genetic diversity make it challenging to analyze using short-read sequencing (SRS) technology. We sequence the MHC region of 100 Han Chinese individuals using both long-read sequencing (LRS) and SRS platforms at approximately 30X coverage to study genetic alterations and their potential functional impacts. LRS provides significantly greater coverage of the MHC region and eight classical HLA genes, particularly at the HLA-DRB1 locus, compared with SRS. We detect 78,249 single nucleotide polymorphisms (SNPs) using LRS, with 26.0% undetectable by SRS. Based on SNP and inferred HLA allele types, we construct an LRS-based MHC reference panel for the Han Chinese, containing approximately 2.6 times more genetic variants than the SRS-based Han-MHC reference panel. A phenome-wide association study assessing 26,024 phenotypes across 15 categories identifies significant associations for 7,879 independent variants (including 809 LRS-specific SNPs) with 409 phenotypes in nine categories. This analysis reveals 24 unreported HLA allele associations in the bioelectric and cellular categories. The conditional analysis identifies 530 independent signals across the 409 phenotypes, including 28 previously unreported signals of eight classical HLA genes associated with 33 phenotypes. Of the top-associated SNPs, 191 are detected by LRS only. Fine-mapping identifies 126 independent candidate causal SNPs for three immune-related cellular phenotypes, with 17 detected exclusively by LRS. Our study reveals previously unreported variants and their functional impacts in the MHC region, enhancing our understanding of genetic diversity and its potential biological implications in the Han Chinese population.
Read full abstract