Abstract

Abstract The use of gene expression data has been crucial to the functional characterization of changes in molecular pathway activity and for identifying targets for novel treatments. However, the interpretation of this data is complicated by its high dimensionality and the difficulty of identifying biological signals within a list of differentially expressed genes. Gene Set Enrichment Analysis (GSEA) is a standard method for identifying pathway enrichment in gene expression data by testing whether a set of genes whose expression would indicate the activity of a specific process or phenotype are coordinately up- or downregulated more than would be expected by chance. As GSEA relies on high quality gene sets with coordinately regulated member genes, we maintain the Molecular Signatures Database (MSigDB) which contains 9 collections of curated gene sets representing different biological pathways and processes. Over time, we have observed that some of the MSigDB gene sets, especially those that are manually curated or defined in a very specific biological context, may not provide a sensitive and specific enough co-regulation signature. In response, we have created a data-driven, matrix-factorization-based refinement method to build more sensitive and specific gene sets. This method incorporates large-scale datasets from multiple sources such as the Cancer Dependency Map as well as curated protein-protein interaction networks. We will present the initial results of this refinement method and our ongoing work which will yield a new collection of refined gene sets that will be made freely available in MSigDB for use with GSEA and many other applications. Citation Format: Alexander T. Wenzel, Pablo Tamayo, Jill P. Mesirov. Data driven refinement of gene signatures for enrichment analysis and cell state characterization. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4281.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call