Abstract

Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be cost-ineffective and time-consuming. Here we report ImReP, a computational method for rapid and accurate profiling of the Ig repertoire, including the complementary-determining region 3 (CDR3), using regular RNA sequencing data such as those from 8,555 samples across 53 tissues types from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. Using ImReP and GTEx v6 data, we generate a collection of 3.6 million Ig sequences, termed the atlas of immunoglobulin repertoires (TAIR), across a broad range of tissue types that often do not have reported Ig repertoires information. Moreover, the flow of Ig clonotypes and inter-tissue repertoire similarities across immune-related tissues are also evaluated. In summary, TAIR is one of the largest collections of CDR3 sequences and tissue types, and should serve as an important resource for studying immunological diseases.

Highlights

  • Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be costineffective and time-consuming

  • Repertoire analysis from RNA sequencing (RNA-Seq) data typically starts with mapping the reads to the germline V, D, and J genes that can be obtained from the International ImMunoGeneTics (IMGT) database[11]

  • Our initial study demonstrates the ability of ImReP to efficiently extract Ig-derived reads from RNA-Seq data and accurately assemble the corresponding hypervariable region sequences

Read more

Summary

Introduction

Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be costineffective and time-consuming. Upon activation of a B cell, somatic hypermutation further diversifies Ig in their variable region These changes are mostly single-base substitutions occurring at extremely high rates —somatic hypermutation can undergo 10−5 to 10−3 mutations per base pair per generation[3]. Used assay-based approaches to RNA sequencing (RNA-Seq) provide a detailed view of the adaptive immune system by leveraging the deep sequencing of amplified DNA or RNA from the variable region of the Ig locus (BCR-Seq)[4,5,6] Those technologies are usually restricted to one chain, with the majority of studies focusing on the heavy chain of the Ig repertoire. Existing methods that are capable of assembling Ig repertoires from bulk RNA-Seq data typically produce lowaccuracy results (F-score < 0.2)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call