The Gabriella Miller Kids First Pediatric Research Program (Kids First) aims at facilitating researchers to uncover new insights into the biology of childhood cancer (CC) and structural birth defects (SBD). Kids First has two initiatives, i) whole genome sequencing of biospecimens from families with CC/SBD, and ii) establishing Kids First Data Resources. Kids First Data Resource Center developed the Kids First Data Resource Portal (KFDRP), a centralized platform to search, view, analyze, and identify currently accessible data from both Kids First and collaborative cohorts, incorporating omics and phenotypic information of 30 studies and 26,300 participants. A recently released KFDRP component is Variant WorkBench (VWB), enabling users to query, mangle, analyze and visualize genomic variants from participating cohorts, with the Children’s Brain Tumor Network (CBTN) being one of the cohorts. VWB supports programming languages such as Python, Spark, SQL and R for in-depth analysis in Apache Zeppelin notebooks. In addition to variant calls and phenotypic information, VWB hosts rich external variant annotations in the public domain, such as Cancer Hotspots, COSMIC and ClinVar. Users can also load additional databases (e.g. Human Gene Mutation Database/HGMD) within a notebook, import custom datasets as temporary query tables, export analysis outputs to local drives, visualize analysis results in multiple chart styles, display local figures, and save notebooks for sharing, further use and Cavatica projects. In an effort to screen tier 1 genes (n=578) from the most recent Cancer Gene Census provided by COSMIC in CBTN, we identified ~127,500 germline variants that are both rare and damaging, or that are already cataloged in the most recent version of ClinVar/HGMD. The whole process took less than one hour which is much faster than conventional methods. VWB enables efficient genomic variant analysis and discoveries in pediatric neuro-oncology research with advanced big data technology.
Read full abstract