Abstract

Abstract Background: Effective target discovery strategies are key to developing new precision medicines against cancer. Large-scale efforts to systematically characterize cancer-associated germline variants, genomic changes in tumors and cancer genetic dependencies via CRISPR and RNAi in cell lines provide valuable resources for machine learning (ML) approaches designed to identify disease-modifying targets. However, fundamental challenges remain in understanding how various data types (including primary tumor and cell line data) can be jointly modeled to decipher the molecular network contexts underpinning the biology and efficacy of new target candidates. Approach: Using a data-driven knowledge graph spanning >2 billion relations over >350,000 biomedical entities, we assembled a comprehensive target prioritization resource with 53 feature types including multi-omics (germline and tumor) and genetic dependency data. Feature categories consisted of raw feature types (e.g. germline genetic, somatic mutation, copy number variation, transcriptomic, proteomic, survival associations, CRISPR/RNAi sensitivity, selectivity, efficacy and essentiality), network-transformed features capturing the underlying molecular network context of each possible target, and multidimensional network integration features combining systems-level information from multiple feature types. Multiple ML models were trained on either omics-based features only or on both omics and genetic dependency features; and using information from each cancer type separately, as well as jointly in multi-disease models. Results: Individual features and the resulting ML models were evaluated using known drug targets in 20 cancer types. We found that a) germline genetic and survival association features along with genetic dependency (efficacy and selectivity) features were the most predictive prior to network transformation; b) network-based feature transformation raised the predictive power of individual features (e.g. increasing mean AUPRC from 0.024 to 0.064 for germline genetic and from 0.018 to 0.048 for cell line sensitivity features); c) network integration of multiple feature types increased mean AUPRC by 21% over the best individual omics features; d) ML models integrating both omics and genetic dependency data outperformed ML omics-only approaches (mean AUPRC 0.219 vs. 0.195); and e) multi-disease ML models provided the best overall performance (mean AUPRC 0.232) underlining the value of multi-dimensional data integration and information transfer across cancer types. Resulting network-based machine learning models provide a highly interpretable view into the top-scoring genes for each cancer, prioritizing established targets, targets recently being evaluated (e.g. FGFR2 and FGFR3 in lung squamous cell carcinoma) and novel target candidates. Citation Format: Janusz Dutkowski, Radosław Bielecki, Karol Nienałtowski, Michał Kukiełka, Roy Ronen. Knowledge graph integration of germline, primary cancer and cancer genetic dependency data prioritizes new target candidates [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr LB148.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call