Abstract
The advent of high-throughput sequencing technologies has led to the need for flexible and user-friendly data preprocessing platforms. The Pipeliner framework provides an out-of-the-box solution for processing various types of sequencing data. It combines the Nextflow scripting language and Anaconda package manager to generate modular computational workflows. We have used Pipeliner to create several pipelines for sequencing data processing including bulk RNA-sequencing (RNA-seq), single-cell RNA-seq, as well as digital gene expression data. This report highlights the design methodology behind Pipeliner that enables the development of highly flexible and reproducible pipelines that are easy to extend and maintain on multiple computing environments. We also provide a quick start user guide demonstrating how to setup and execute available pipelines with toy datasets.
Highlights
High-throughput sequencing (HTS) technologies are vital to the study of genomics and related fields
We argue that Pipeliner is a suitable choice for users looking for alternative reprocessing of The Cancer Genome Atlas (TCGA) datasets with minimal pipeline development
We apply the RNA-seq pipeline to real-word data by processing raw sequencing reads from the diffuse large B-cell lymphoma (DLBC) cohort provided by the TCGA and provide supplementary files that can be used to repeat the analysis or serve as a template for applying Pipeliner to other publicly available datasets
Summary
High-throughput sequencing (HTS) technologies are vital to the study of genomics and related fields. Breakthroughs in cost efficiency have made it common for studies to obtain millions of raw sequencing reads Processing these data requires a series of computationally intensive tools that can be unintuitive to use, difficult to combine into stable workflows that can handle large number of samples, and challenging to maintain over long periods of time in different environments. The effort to simplify this process has resulted in the development of sequencing pipelines such as RseqFlow (Wang et al, 2011), PRADA (Torres-García et al, 2014), and Galaxy (Goecks et al, 2010), among others. Pipelines developed within the framework are platform independent and fully reproducible and inherit automated job parallelization and failure recovery Their flexibility and modular architecture allows users to customize and modify processes. Pipeliner is a complete and user-friendly solution to meet the demands of processing large amounts and various types of sequencing data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.