Abstract

Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the complexity of these loci, which contain extensive regions of paralogy, segmental duplication and high copy-number repeats, but recent progress in long-read sequencing and optical mapping techniques suggests this problem may now be tractable. Here we assess this by using long-read sequencing platforms from PacBio and Oxford Nanopore, supplemented with short-read sequencing and Bionano optical mapping, to sequence DNA extracted from CD14+ monocytes and peripheral blood mononuclear cells from a single European individual identified as HV31. We use this data to build a de novo assembly of eight genomic regions encoding four key components of the immune system, namely the human leukocyte antigen, immunoglobulins, T cell receptors, and killer-cell immunoglobulin-like receptors. Validation of our assembly using k-mer based and alignment approaches suggests that it has high accuracy, with estimated base-level error rates below 1 in 10 kb, although we identify a small number of remaining structural errors. We use the assembly to identify heterozygous and homozygous structural variation in comparison to GRCh38. Despite analyzing only a single individual, we find multiple large structural variants affecting core genes at all three immunoglobulin regions and at two of the three T cell receptor regions. Several of these variants are not accurately callable using current algorithms, implying that further methodological improvements are needed. Our results demonstrate that assessing haplotype variation in these regions is possible given sufficiently accurate long-read and associated data. Continued reductions in the cost of these technologies will enable application of these methods to larger samples and provide a broader catalogue of germline structural variation at these loci, an important step toward making these regions accessible to large-scale genetic association studies.

Highlights

  • The capability of the human immune system to respond to environmental pathogens results from its substantial diversity and variability, both among individuals within a population and among cells within a single host

  • We generated accurate assemblies by integrating multiple complementary data types, we noted a small subset of locations that remain challenging. We found that this individual contains multiple structural differences between the two inherited chromosomes and compared to previously analyzed genomes, affecting the copy number of immune system genes

  • Key components of the innate and adaptive immune system, including the human leukocyte antigen (HLA), immunoglobulins (IG), T cell receptors (TCR) and killer-cell immunoglobulin-like receptors (KIR), have evolved exceptional complexity in their genomic loci, featuring numerous highly similar genes interspersed with pseudogenes and repetitive elements

Read more

Summary

Introduction

The capability of the human immune system to respond to environmental pathogens results from its substantial diversity and variability, both among individuals within a population and among cells within a single host. Key components of the innate and adaptive immune system, including the human leukocyte antigen (HLA), immunoglobulins (IG), T cell receptors (TCR) and killer-cell immunoglobulin-like receptors (KIR), have evolved exceptional complexity in their genomic loci, featuring numerous highly similar genes interspersed with pseudogenes and repetitive elements. The major histocompatibility complex (MHC) encoding HLA is so far the best-studied example, with hundreds of associations known across multiple classes of disease [1,2] including infections [3,4]. Despite the clearly important role of immunoglobulins (IG), TCR and KIR [7,8,9], the underlying complexity of these genomic regions has so far prevented a full analysis of their contribution to human disease

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.