Abstract

Abstract Introduction: The evolution of precision oncology will demand the integration of multi-omics, clinical data, demographics and outcomes, which is a complex and computationally intensive endeavor. Accurate and robust molecular characterization of pediatric leukemias require a comprehensive approach involving longitudinal sample collection across a large cohort. Genomic alterations, both germline and somatic, need to be efficiently captured and annotated to identify clinically meaningful trends. Algorithms developed for conducting genomic analysis typically leverage a central processing unit (CPU) environment for data processing and traditional statistical approach, such as Bayes' theorem, for variant detection. More recently, algorithms have been developed that can leverage a graphics processing unit (GPU) architecture and machine learning (ML) technology. In this study we benchmarked these techniques and applied them sequencing data generated from pediatric AML subjects enrolled on the Children's Oncology Group AAML1031 trial (951 samples). Methods: Whole genome sequencing was conducted using the Illumina platform (2X150) and sequenced between 30x-60x coverage. Genome alignment files were generated and subjected to germline and somatic variant calling (NVIDIA's Selene DGXA100 cluster; Parabricks Pipelines). Germline variants were called using Haplotype caller and DeepVariant (Parabricks v3.0). HG002 genome in a bottle sample was used as a positive control to determine precision, recall and F1 score. Somatic variant calling was completed using somatic sniper, VarScan2, Muse, MuTect2, and Strelka (Parabricks v3.5) and SEQCII data was used to assess somatic variant calling performance. Results: Control sample HG002 was used for benchmarking the algorithms tested. At all depths analyzed, 50X, 30X, 15X, and 5X, GoogleDeep Variant had the highest F1 score, precision, and recall compared to Haplotyper for SNPs and InDels. For somatic SNV detection using SEQCII dataset, Strelka2 had the highest F1 score, precision, and recall (0.9435, 0.9445, 0.9425), followed by MuTect2, MUSE, VarScan2, and SomaticSniper. For somatic InDel detection, MuTect2 and Strelka2 performed similar with MuTect2 higher precision, but Strelka2 had higher recall. These pipelines were utilized to analyze 951 pediatric AML samples, and demonstrated superior processing time and identification of variants, both germline and somatic. Discussion: ML algorithms integrated into a GPU computing environment demonstrated greater precision and accuracy for germline variant detection compared to traditional statistical approaches for variant detection. Ongoing efforts are focused at integration of multiple data sources correlated with outcomes and analyzed via ML tools in order to improve future risk stratification and therapeutic selection for superior outcomes. Citation Format: Erin L. Crowgey, Pankaj Vats, Karl Franke, Gary Burnett, Ankit Sethia, Timothy Harkins, Todd E. Druley. Enhanced processing of genomic sequencing data for pediatric cancers: GPUs and machine learning techniques for variant detection [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 165.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call