DIVIS: Integrated and Customizable Pipeline for Cancer Genome Sequencing Analysis and Interpretation

Xiaoyu He,Siyao Liu,Xinyin Han,Beifang Niu,Xintong Wang,Jiayin He,Xiaohong Duan,Danyang Yuan,Yu Zhang

doi:10.3389/fonc.2021.672597

Abstract

Next-generation sequencing (NGS) has drastically enhanced human cancer research, but diverse sequencing strategies, complicated open-source software, and the identification of massive numbers of mutations have limited the clinical application of NGS. Here, we first presented GPyFlow, a lightweight tool that flexibly customizes, executes, and shares workflows. We then introduced DIVIS, a customizable pipeline based on GPyFlow that integrates read preprocessing, alignment, variant detection, and annotation of whole-genome sequencing, whole-exome sequencing, and gene-panel sequencing. By default, DIVIS screens variants from multiple callers and generates a standard variant-detection format list containing caller evidence for each sample, which is compatible with advanced analyses. Lastly, DIVIS generates a statistical report, including command lines, parameters, quality-control indicators, and mutation summary. DIVIS substantially facilitates complex cancer genome sequencing analyses by means of a single powerful and easy-to-use command. The DIVIS code is freely available at https://github.com/niu-lab/DIVIS, and the docker image can be downloaded from https://hub.docker.com/repository/docker/sunshinerain/divis.

Highlights

Deciphering human cancer genome sequencing data is critical for the mapping of tumorigenesis and the development of targeting therapeutic strategies
The major focus of this research field is on cancer driver genes (CDGs) and cancer susceptibility genes (CSGs): CDGs are genes in which mutations confer cells a growth advantage that helps tumors proliferate [1], and CSGs are genes in which mutations, typically inherited, increase the risk of certain types of cancer [2]
To simplify cancer genome sequencing analysis, facilitate workflow extension, and provide accurate mutation results, we presented GPyFlow and DIVIS, an easy-to-use, extensible, and customizable cancer genome sequencing analysis platform

Summary

INTRODUCTION

Deciphering human cancer genome sequencing data is critical for the mapping of tumorigenesis and the development of targeting therapeutic strategies. Bioinformatics development has promoted the continuous updating of software as well as the emergence of new bioinformatics tools; this requires high scalability and customizability of the analysis pipelines Workflow management software such as Snakemake [23], Bpipe [24], Ruffus [25], Nextflow [26], and Galaxy [27] sufficiently satisfy this requirement by automatically generating analysis scripts through graphical operation, which simplifies human intervention. A web-based platform, provides users with a visual workflow editor, and several bioinformatics tools [27] These software tools perform effectively in workflow execution and management, most of them require user knowledge of DSL or specific programming language Common Workflow Language). The validation and performance test were conducted on a computer cluster with an Intel Xeon e5-2680v3 processor (2.5 GHz, 12 cores) and Linux machine running CentOS 6.4 with Intel(R) Xeon(R) CPU E52680 v2 @2.80GHz

METHODS

RESULTS

DISCUSSION