Abstract

Next-generation sequencing (NGS) has drastically enhanced human cancer research, but diverse sequencing strategies, complicated open-source software, and the identification of massive numbers of mutations have limited the clinical application of NGS. Here, we first presented GPyFlow, a lightweight tool that flexibly customizes, executes, and shares workflows. We then introduced DIVIS, a customizable pipeline based on GPyFlow that integrates read preprocessing, alignment, variant detection, and annotation of whole-genome sequencing, whole-exome sequencing, and gene-panel sequencing. By default, DIVIS screens variants from multiple callers and generates a standard variant-detection format list containing caller evidence for each sample, which is compatible with advanced analyses. Lastly, DIVIS generates a statistical report, including command lines, parameters, quality-control indicators, and mutation summary. DIVIS substantially facilitates complex cancer genome sequencing analyses by means of a single powerful and easy-to-use command. The DIVIS code is freely available at https://github.com/niu-lab/DIVIS, and the docker image can be downloaded from https://hub.docker.com/repository/docker/sunshinerain/divis.

Highlights

  • Deciphering human cancer genome sequencing data is critical for the mapping of tumorigenesis and the development of targeting therapeutic strategies

  • The major focus of this research field is on cancer driver genes (CDGs) and cancer susceptibility genes (CSGs): CDGs are genes in which mutations confer cells a growth advantage that helps tumors proliferate [1], and CSGs are genes in which mutations, typically inherited, increase the risk of certain types of cancer [2]

  • To simplify cancer genome sequencing analysis, facilitate workflow extension, and provide accurate mutation results, we presented GPyFlow and DIVIS, an easy-to-use, extensible, and customizable cancer genome sequencing analysis platform

Read more

Summary

INTRODUCTION

Deciphering human cancer genome sequencing data is critical for the mapping of tumorigenesis and the development of targeting therapeutic strategies. Bioinformatics development has promoted the continuous updating of software as well as the emergence of new bioinformatics tools; this requires high scalability and customizability of the analysis pipelines Workflow management software such as Snakemake [23], Bpipe [24], Ruffus [25], Nextflow [26], and Galaxy [27] sufficiently satisfy this requirement by automatically generating analysis scripts through graphical operation, which simplifies human intervention. A web-based platform, provides users with a visual workflow editor, and several bioinformatics tools [27] These software tools perform effectively in workflow execution and management, most of them require user knowledge of DSL or specific programming language Common Workflow Language). The validation and performance test were conducted on a computer cluster with an Intel Xeon e5-2680v3 processor (2.5 GHz, 12 cores) and Linux machine running CentOS 6.4 with Intel(R) Xeon(R) CPU E52680 v2 @2.80GHz

METHODS
RESULTS
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call