Abstract

SummaryAntibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets.We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. Availability and implementationAll tools utilized in the paper are free for academic use.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Immunoinformatics has transformed the field of antibody discovery and diagnostics (Brown et al, 2019; Kidd et al, 2014; Miho et al, 2018; Robinson, 2015)

  • Unlike T-cell receptors (TCRs), B-cell clonotyping is based on clonal lineages and is usually limited to the heavy chain sequences due to the lesser degree of diversity present in the light chain (Collins et al, 2008; Yaari and Kleinstein, 2015) and because diversity in the CDR3 region of the heavy chain is sufficient for most antibody specificities (Xu and Davis, 2000)

  • We found that while IMGT/ HighV-QUEST aligned the most sequences of the V genes in the Illumina Miseq Dataset A (450 569 sequences) compared to MiXCR (155 774 sequences) and IgBLAST (200 164 sequences), IMGT/ HighV-QUEST lost a significant portion (86.9%) by preprocessing compared to MiXCR (16.3%) and IgBLAST (52.8%) (Fig. 3)

Read more

Summary

Introduction

Immunoinformatics has transformed the field of antibody discovery and diagnostics (Brown et al, 2019; Kidd et al, 2014; Miho et al, 2018; Robinson, 2015). There are many different definitions of clones in the context of B-cell receptor (BCR) repertoire sequencing in the literature These definitions can range from identical CDR3 amino acids (a.a.), clusters of similar CDR3 sequences or include the entire variable region (Greiff et al, 2015b; Hershberg and Luning Prak, 2015; Miho et al, 2018; Nouri and Kleinstein, 2018). Unlike T-cell receptors (TCRs), B-cell clonotyping is based on clonal lineages and is usually limited to the heavy chain sequences due to the lesser degree of diversity present in the light chain (Collins et al, 2008; Yaari and Kleinstein, 2015) and because diversity in the CDR3 region of the heavy chain is sufficient for most antibody specificities (Xu and Davis, 2000)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call