NGSPERL: a semi-automated framework for large scale next generation sequencing data analysis

Quanhu Sheng,Shilin Zhao,Yu Shyr,Mingsheng Guo

doi:10.1504/ijcbdd.2015.072082

Abstract

High-throughput sequencing technologies have been widely used in medical and biological research, especially in cancer biology. With the huge amounts of sequencing data being generated, data analysis has become the bottle-neck of the research procedure. We have designed and implemented NGSPERL, a semi-automated module-based framework, for high-throughput sequencing data analysis. Three major analysis pipelines with multiple tasks have been developed for RNA sequencing, exome sequencing, and small RNA sequencing data. Each task was developed as module. The module uses the output from the previous task as the input parameter to generate the corresponding portable batch system (PBS) script. The PBS scripts can be either submitted to cluster or run directly based on user choice. Multiple tasks can also be combined together as a single task to simplify the data analysis. Such a flexible framework will significantly automate and simplify the process of large scale sequencing data analysis.

Full Text