Abstract
Germline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 potential novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-REGION in the 5′UTR, L-PART1 and L-PART2 sequences, and found that identical V-REGION alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-REGION but also in the upstream sequences of IGHV genes. Our findings provide a new perspective for annotating immunoglobulin repertoire sequencing data.
Highlights
Immunoglobulins are an important part of the adaptive immune system
We are the first to provide a comprehensive overview of the heavy chain upstream (5 UTR, L-PART1, and L-PART2) sequence variants in an AIRR-seq dataset
We managed to validate a number of novel alleles by targeted amplification of genomic DNA of the same individuals
Summary
Immunoglobulins are an important part of the adaptive immune system. They exert their function either as the antigen receptor of B cells that is essential for the antigen presentation capacity of these cells [1], or as secreted antibodies that survey extracellular fluids of the body. The genes of the heavy chain are located on chromosome 14 (14q32.33) [3], while the light chain genes are present on two separate loci, kappa and lambda, which are located on chromosome 2 (2p11.2) and chromosome 22 (22q11.2) respectively [4] These loci remain incompletely characterized due to the fact that they contain many repetitive sequence segments with many duplicated genes [5], which makes it difficult to correctly assemble short reads from whole genome sequencing. To this date, a limited number of genomically sequenced [6,7,8] and inferred [9,10] haplotypes of the heavy chain and the two light chain loci have been described. Different databases exist for genomic immune receptor DNA sequences (IMGT/GENE-DB [11]), putative novel variants from inferred data (IgPdb, https://cgi.cse.unsw.edu.au/ ~ihmmune/IgPdb/information.php) or entire immune receptor repertoires (OGRDB [12])
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have