Abstract The Gabriella Miller Kids First Pediatric Research Program (Kids First) aims at facilitating researchers to uncover new insights into the biology of childhood cancer (CC) and structural birth defects (SBD), including the discovery of shared genetic pathways between these disorders. Kids First has two initiatives, which are whole genome sequencing (WGS) of biospecimens from families with CC/SBD, and establishing Kids First Data Resource. Kids First Data Resource Center (KFDRC) developed Kids First Data Resource Portal (KFDRP; https://portal.kidsfirstdrc.org/), which is a centralized platform to search, view, analyze, and identify currently accessible data from both Kids First and collaborative cohorts, incorporating omics and phenotypic information of 30 studies and 26,300 participants. Recently KFDRC released two new KFDRP components named Variant DataBase (VDB) and Variant WorkBench (VWB), enabling users to query, mangle, analyze and visualize germline genomic variants. The current release includes ~309 million unique variants in a matrix of more than 61.3 billion individual-chromosomal position occurrences from over 11,500 participants in 17 studies. While VDB provides a quick variant summary, VWB supports scripting languages such as Python, Spark, SQL, R, and MarkDown as in-depth analysis tools enabled by Apache Zeppelin notebooks. In addition to variant calls and phenotypic information such as Human Phenotype Ontology (HPO) terms and Mondo IDs, VWB hosts rich external variant annotations in the public domain, such as Cancer Hotspots, ClinVar, COSMIC, dbNSFP, gnomAD, TOPMed, as well as gene-phenotype links provided by OMIM, HPO, Orphanet, and the Deciphering Developmental Disorders Project. Users can also load additional databases within a notebook (such as the subscription-based Human Gene Mutation Database [HGMD]), import custom datasets as temporary query tables, export analysis outputs to local drives, visualize analysis results in multiple chart styles, display local figures, and save notebooks for sharing, further use and Cavatica projects. In an effort to screen tier 1 genes (n=578) from the most recent Cancer Gene Census provided by COSMIC for rare deleterious variants in all six currently available CC cohorts in KFDRP, we identified over 1.2 million germline variants that are with minor allele frequency no more than 0.00001 among gnomAD/TOPMed datasets and with ratio of number of “damaging” predictions over number of all predictions in dbNSFP between 0.5 and 1, or that are already cataloged in the most recent version of ClinVar/HGMD. The whole process took less than two hours which is much faster than conventional methods. These variant tools in KFDRP enable efficient genomic variant analysis in cancer research with advanced big data technology. Citation Format: Yiran Guo, Jeremy Costanza, Christophe Botek, Miguel Brown, David Higgins, Yuankun Zhu, Bailey Farrow, Allison Heath, Adam Resnick, Vincent Ferretti, Kids First Data Resource Center. Making discoveries with Kids First Variant DataBase and WorkBench [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr LB501.
Read full abstract