Abstract

Large-scale study of the properties of T-cell receptor (TCR) and B-cell receptor (BCR) repertoires through next-generation sequencing is providing excellent insights into the understanding of adaptive immune responses. Variable(Diversity)Joining [V(D)J] germline genes and alleles must be characterized in detail to facilitate repertoire analyses. However, most species do not have well-characterized TCR/BCR germline genes because of their high homology. Also, more germline alleles are required for humans and other species, which limits the capacity for studying immune repertoires. Herein, we developed “Immune Germline Prediction” (IMPre), a tool for predicting germline V/J genes and alleles using deep-sequencing data derived from TCR/BCR repertoires. We developed a new algorithm, “Seed_Clust,” for clustering, produced a multiway tree for assembly and optimized the sequence according to the characteristics of rearrangement. We trained IMPre on human samples of T-cell receptor beta (TRB) and immunoglobulin heavy chain and then tested it on additional human samples. Accuracy of 97.7, 100, 92.9, and 100% was obtained for TRBV, TRBJ, IGHV, and IGHJ, respectively. Analyses of subsampling performance for these samples showed IMPre to be robust using different data quantities. Subsequently, IMPre was tested on samples from rhesus monkeys and human long sequences: the highly accurate results demonstrated IMPre to be stable with animal and multiple data types. With rapid accumulation of high-throughput sequence data for TCR and BCR repertoires, IMPre can be applied broadly for obtaining novel genes and a large number of novel alleles. IMPre is available at https://github.com/zhangwei2015/IMPre.

Highlights

  • The “immune repertoire” is defined as the collection of diverse T-cell receptors (TCRs) and B-cell receptors (BCRs) created by somatic recombination of many germline V, D, J, and C gene segments

  • We found that a length of the V deletion >50% for T-cell receptor beta (TRB) and 70% for immunoglobulin heavy-chain (IGH) were within 1 bp, whereas the length of the J deletion was much more diverse; the value tended to decrease if the deletion length increased

  • The assembly step used Ar and Ur to judge whether a sequence extension should continue or terminate. To train these two parameters, we calculated the Ar and Ur for the true germline sequence (TGS) and error germline sequence (EGS) in each cluster outputted by the clustering step of Immune Germline Prediction” (IMPre)

Read more

Summary

Introduction

The “immune repertoire” is defined as the collection of diverse T-cell receptors (TCRs) and B-cell receptors (BCRs) created by somatic recombination of many germline V (variable), D (diversity), J (joining), and C (constant) gene segments. Prediction of TCR/BCR Genes and Alleles assay This strategy allows researchers to study the repertoire in a more comprehensive way. Well-characterized TCR/BCR germline genes are critical for analyses and interpretation of Rep-seq data. The publically available ImMunoGeneTics (IMGT) database collects the genes of certain species. Such information is not available for most species, which makes studying of repertoires highly challenging (if not unattainable). Deciphering of TCR and BCR germline loci requires additional resource-intensive efforts beyond conventional sequencing of the whole genome because these loci comprise multiple highly homologous and polymorphic gene family members. Like gene loci from human leukocyte antigens, germline genes exhibit high polymorphism of alleles. Well-characterized TCR/BCR germline alleles (polymorphisms) are critical for Repseq analyses

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call