Abstract

Abstract As constantly improving in capacity and reducing in price, genomic sequencing is becoming a routine part of medical practice for cancer patients. Variant calling in sequencing data is a fundamental prerequisite for any downstream analysis, thus playing a critical role in both basic research and clinical care of cancers. However, it remains notably time-consuming for most available tools (e.g., ~2 days for MuSE 1.0 and MuTect2 to complete running on one tumor-normal pair of whole-genome sequencing (WGS) data). Here, we launch MuSE 2.0, which maintains the same input and output as our previously released MuSE 1.0, but speeds up significantly for both whole-exome sequencing (WES) and WGS data. MuSE 2.0 employs a multithreaded producer-consumer model and the OpenMP library for parallel computing, including parsing and uncompressing reads from BAM files, detecting and filtering variants, and writing output. Using sample data from ICGC and TCGA, we have benchmarked the speed performance of MuSE 2.0 against MuSE 1.0, MuTect2, SomaticSniper and VarScan2, which are chosen as the somatic mutation callers in the GDC DNA-seq analysis pipeline (https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/). Compared to MuSE 1.0, MuSE 2.0 with 80 cores reduces the time cost of SNV calling from ∼40 hours per pair of tumor-normal WGS data down to less than 1 hour, and from 2-4 hours per pair of tumor-normal WES data down to 4-5 minutes (~50-60 times speedup). It is also faster than three other methods (up to 60 times), i.e., MuTect2, VarScan2 and SomaticSniper, for both WGS and WES data. These results show that MuSE 2.0 is a time-efficient tool and is expected to remove somatic mutation calling as a time-consuming obstacle for cancer genomic studies and clinicians decision making. MuSE 1.0 was adopted in multiple large-scale pipelines, including as a major contributing caller to reach final consensus calls by the TCGA PanCanAtlas project across 13,000 tumor samples, and the ICGC Pan-Cancer Analysis of Whole Genomes (PCAWG) initiative across 2,700 tumor samples. We therefore expect MuSE 2.0 to significantly accelerate the variant calling process and benefit the scientific and clinical communities. MuSE 2.0 is implemented in C++ and is freely available at GitHub https://github.com/wwylab/MuSE. Citation Format: Shuangxi Ji, Tong Zhu, Ankit Sethia, Matthew D. Montierth, Wenyi Wang. Accelerated somatic mutation calling tool for whole-genome and whole-exome sequencing data from heterogenous tumor samples [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2070.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call