Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines.

Anthony Federico,Yusuke Koga,Tanya Karagiannis,Dileep Kishore,Kritika Karri,Joshua D Campbell,Stefano Monti

doi:10.3389/fgene.2019.00614

Anthony Federico, Yusuke Koga + Show 5 more

Open Access

https://doi.org/10.3389/fgene.2019.00614

Copy DOI

Journal: Frontiers in Genetics	Publication Date: Jun 28, 2019
Citations: 17	License type: CC BY 4.0

Affiliation: Boston University

Abstract

The advent of high-throughput sequencing technologies has led to the need for flexible and user-friendly data preprocessing platforms. The Pipeliner framework provides an out-of-the-box solution for processing various types of sequencing data. It combines the Nextflow scripting language and Anaconda package manager to generate modular computational workflows. We have used Pipeliner to create several pipelines for sequencing data processing including bulk RNA-sequencing (RNA-seq), single-cell RNA-seq, as well as digital gene expression data. This report highlights the design methodology behind Pipeliner that enables the development of highly flexible and reproducible pipelines that are easy to extend and maintain on multiple computing environments. We also provide a quick start user guide demonstrating how to setup and execute available pipelines with toy datasets.

Highlights

High-throughput sequencing (HTS) technologies are vital to the study of genomics and related fields
We argue that Pipeliner is a suitable choice for users looking for alternative reprocessing of The Cancer Genome Atlas (TCGA) datasets with minimal pipeline development
We apply the RNA-seq pipeline to real-word data by processing raw sequencing reads from the diffuse large B-cell lymphoma (DLBC) cohort provided by the TCGA and provide supplementary files that can be used to repeat the analysis or serve as a template for applying Pipeliner to other publicly available datasets

Summary

INTRODUCTION

High-throughput sequencing (HTS) technologies are vital to the study of genomics and related fields. Breakthroughs in cost efficiency have made it common for studies to obtain millions of raw sequencing reads Processing these data requires a series of computationally intensive tools that can be unintuitive to use, difficult to combine into stable workflows that can handle large number of samples, and challenging to maintain over long periods of time in different environments. The effort to simplify this process has resulted in the development of sequencing pipelines such as RseqFlow (Wang et al, 2011), PRADA (Torres-García et al, 2014), and Galaxy (Goecks et al, 2010), among others. Pipelines developed within the framework are platform independent and fully reproducible and inherit automated job parallelization and failure recovery Their flexibility and modular architecture allows users to customize and modify processes. Pipeliner is a complete and user-friendly solution to meet the demands of processing large amounts and various types of sequencing data

Design and Features

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Comparison of sequencing methods and data processing pipelines for whole genome sequencing and minority single nucleotide variant (mSNV) analysis during an influenza A/H5N8 outbreak.
Marjolein J Poen ... Clara Amid
PLOS ONE | VOL. 15
Marjolein J Poen, et. al.Marjolein J Poen ... Clara Amid
20 Feb 2020
PLOS ONE | VOL. 15

Combining bulk and single-cell RNA-sequencing data to reveal gene expression pattern of chondrocytes in the osteoarthritic knee
Xiaoyu Li ... Liang Zhao
Bioengineered | VOL. 12
Xiaoyu Li, et. al.Xiaoyu Li ... Liang Zhao
01 Jan 2020
Bioengineered | VOL. 12

SingleScan: a comprehensive resource for single-cell sequencing data processing and mining
Kun Wang ... Haoyang Cai
BMC Bioinformatics | VOL. 24
Kun Wang, et. al.Kun Wang ... Haoyang Cai
07 Dec 2023
BMC Bioinformatics | VOL. 24

De Novo Short-Read Assembly
Douglas W. Bryant ... Todd C. Mockler
-
Douglas W. Bryant, et. al.Douglas W. Bryant ... Todd C. Mockler
22 Sep 2011
22 Sep 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics