Abstract

Abstract Background: Large-scale genomics studies (e.g. AACR Project GENIE, TCGA, TopMed) have sequenced thousands of patients in an attempt to understand disease associated genomic variables and their clinical correlates. Existing online platforms (e.g. cBioPortal) enable simple gene-based queries, but do not allow more complex modeling to understand disease pathogenesis, risk and outcome. There is an urgent need to build an interactive, modular and scalable platform that enables users to perform multivariate machine learning on existing genomic data. Results: We have built a platform, PrismML, that enables a user to interactively query a dataset, and to run a multitude of machine learning tools, from simple statistical tests for differential analysis to multivariate modeling to predict clinical response, or mortality-risk. Since machine learning models are computationally intensive, we have used the power of cloud computing to make the analyses faster and scalable. Key feature of our platform are: (1) availability of extensive statistical and machine learning methods; (2) implementation of best practices for machine learning, e.g. cross-validation; (3) graphical querying of results to understand the interplay among features. Users can choose to analyze existing data/studies, or upload their own data. Examples of possible queries: “identify genomic features that distinguish metastasis from primary tumors, either in a single cancer or pan-cancer”, or, “build a machine learning model to predict survival within ER- breast cancer patients”. In addition, there is also an acute need to integrate the knowledge extracted from the multitude of data types. To this end, we have integrated multiple data types into gene-scores, and have incorporated known biological/functional information by integrating gene-scores into pathway-scores. Summary: PrismML is an interactive and flexible platform to bring the power of machine learning and statistical modeling to the genomics community. This is an active area of development with multiple ongoing features, such as integrating multiple datasets to increase statistical power in rare diseases, and to enable subsetting large diseases to identify prognostic features. Citation Format: Anupama Reddy, Daisy Flemming, Sara Selitsky, Ana Brandusa Pavel, Gabriela Alexe, Gyan Bhanot. PrismML: A machine learning platform to query genotype-phenotype patterns in large genomics studies [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 858.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.