VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data

Scott Christley,Mikhail K Levin,Florian Rubelt,John M Fonner,Inimary T Toby,William H Rounds,Richard H Scheuermann,Lindsay G Cowell,Walter Scarborough,Nancy L Monson

doi:10.1186/s12859-017-1853-z

Abstract

BackgroundPre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files.ResultsProcessing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5′ and 3′ PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO.ConclusionsVDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.

Highlights

Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis
We compare the performance of VDJPipe v0.1.7 with that of another software tool specialized for immunosequencing data, pRESTO v0.5.3 [13]. pRESTO has an alternative design of providing a set of Python scripts, each of which performs one step in the pre-processing workflow
We use two example data sets provided by pRESTO [14, 15] and publically available from SRA under accession ID: ERP003950 and SRX190717

Summary

Results

We compare the performance of VDJPipe v0.1.7 with that of another software tool specialized for immunosequencing data, pRESTO v0.5.3 [13]. pRESTO has an alternative design of providing a set of Python scripts, each of which performs one step in the pre-processing workflow. For the first data set, processing steps include merging the paired-end reads into a single read sequence, quality filtering, 5′ and 3′ primer matching, and collapsing duplicate reads. For the second data set, processing steps include length, homopolymer and quality filtering, generating compositional statistics, barcode demultiplexing, 5′ and 3′ primer matching, and collapsing duplicate reads. Together, these two data sets test all the main functions provided by VDJPipe (Table 1). Competing interests The authors declare that they have no competing interests

Background

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 11, 2017
Citations: 14	License type: open-access

R Discovery Prime

R Discovery Prime

VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

VDJPipe: a pre-processing pipeline for immune repertoire sequencing data
Scott Christley ... Lindsay G Cowell
The Journal of Immunology | VOL. 196
Scott Christley, et. al.Scott Christley ... Lindsay G Cowell
01 May 2016
The Journal of Immunology | VOL. 196

Identification of immunologic factors associated with allograft rejection using NGS T cell receptor repertoire data
Julia Vetter ... Andreas Heinzel
The Journal of Immunology | VOL. 204
Julia Vetter, et. al.Julia Vetter ... Andreas Heinzel
01 May 2020
The Journal of Immunology | VOL. 204

Abstract 7045: RNA sequencing approaches enable tissue specific B and T cell gene expression and immune repertoire profiling
Chen Song ... Gautam Naishadham
Cancer Research | VOL. 84
Chen Song, et. al.Chen Song ... Gautam Naishadham
22 Mar 2024
Cancer Research | VOL. 84

Confirming the phylogeny of mammals by use of large comparative sequence data sets.
Arjun B Prasad ... Marc W Allard
Molecular Biology and Evolution | VOL. 25
Arjun B Prasad, et. al.Arjun B Prasad ... Marc W Allard
02 May 2008
Molecular Biology and Evolution | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics