Abstract

Abs are immune system proteins that recognize noxious molecules for elimination. Their sequence diversity and binding versatility have made Abs the primary class of biopharmaceuticals. Recently, it has become possible to query their immense natural diversity using next-generation sequencing of Ig gene repertoires (Ig-seq). However, Ig-seq outputs are currently fragmented across repositories and tend to be presented as raw nucleotide reads, which means nontrivial effort is required to reuse the data for analysis. To address this issue, we have collected Ig-seq outputs from 55 studies, covering more than half a billion Ab sequences across diverse immune states, organisms (primarily human and mouse), and individuals. We have sorted, cleaned, annotated, translated, and numbered these sequences and make the data available via our Observed Antibody Space (OAS) resource at http://antibodymap.org The data within OAS will be regularly updated with newly released Ig-seq datasets. We believe OAS will facilitate data mining of immune repertoires for improved understanding of the immune system and development of better biotherapeutics.

Highlights

  • Why The JI? Submit online. Rapid Reviews! 30 days* from submission to initial decision No Triage! Every submission reviewed by practicing scientists Fast Publication! 4 weeks from acceptance to publicatio

  • All raw nucleotide reads were converted into amino acids using

  • As well as providing International ImMunoGeneTics information system (IMGT) and gene annotations, ANARCI acts as a broad-brush filter of Ab sequences that are likely to be erroneous

Read more

Summary

Materials and Methods

A list of study accession codes of publicly available Ig-seq datasets were obtained via a literature review. Sequences whose Smith–Waterman algorithm score was below the threshold for all isotypes were assigned as “bulk.” The robustness of this protocol was confirmed on the author-annotated Ig-seq datasets [18, 39, 40], in which it resulted in 99% accurate annotations. To streamline updating OAS with new data, we have generated a procedure to automatically identify Ig-seq datasets from raw sequence read archives. We apply our Ab annotation protocol to each raw nucleotide dataset deposited in the National Center for Biotechnology Information/ European Nucleotide Archive repositories; if we find more than 10,000 Ab sequences in any given dataset, it is set aside for manual inspection. Manual inspection is still necessary to efficiently assign metadata, as these are currently deposited in a nonstandardized manner This procedure allows for automatic identification of new Ig-seq datasets and semiautomatically updating of OAS

Results
Discussion
B Cell Subset Unsorted Unsorted Unsorted
B Cell Subset
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call