CRIS: complete reconstruction of immunoglobulin V-D-J sequences from RNA-seq data.

Rashedul Islam,Andrew P Weng,Misha Bilenky,Joseph M Connors,Martin Hirst,Aida Ouangraoua

doi:10.1093/bioadv/vbab021

Abstract

MotivationB cells display remarkable diversity in producing B-cell receptors through recombination of immunoglobulin (Ig) V-D-J genes. Somatic hypermutation (SHM) of immunoglobulin heavy chain variable (IGHV) genes are used as a prognostic marker in B-cell malignancies. Clinically, IGHV mutation status is determined by targeted Sanger sequencing which is a resource-intensive and low-throughput procedure. Here, we describe a bioinformatic pipeline, CRIS (Complete Reconstruction of Immunoglobulin IGHV-D-J Sequences) that uses RNA sequencing (RNA-seq) datasets to reconstruct IGHV-D-J sequences and determine IGHV SHM status.ResultsCRIS extracts RNA-seq reads aligned to Ig gene loci, performs assembly of Ig transcripts and aligns the resulting contigs to reference Ig sequences to enumerate and classify SHMs in the IGHV gene sequence. CRIS improves on existing tools that infer the B-cell receptor repertoire from RNA-seq data using a portion IGHV gene segment by de novo assembly. We show that the SHM status identified by CRIS using the entire IGHV gene segment is highly concordant with clinical classification in three independent chronic lymphocytic leukemia patient cohorts.Availability and implementationThe CRIS pipeline is available under the MIT License from https://github.com/Rashedul/CRIS.Supplementary information Supplementary data are available at Bioinformatics Advances online.

Full Text