Abstract Pathway enrichment analysis is pivotal for elucidating key oncogenic processes from high-throughput cancer profiling. However, the inherent redundancy and interdependence in biological pathway annotations complicate the meaningful interpretation of enrichment results. This challenge necessitates the effective clustering of enriched pathways. Traditional clustering methods predominantly rely on structural matrices, such as gene overlap within hierarchical pathway graphs. However, these methods do not account for the biological context of the experiment, potentially overlooking critical, context-specific insights tied to cancer types, stages, and other key factors. To address these limitations, we present a context-aware approach for clustering and interpretating pathway enrichment results using emerging large language models (LLMs). Our method involves: 1) generating contextually enriched pathway summaries based on original definitions, using LLMs with prompt engineering techniques; 2) embedding these summaries into high-dimensional quantitative representations, to capture pathway-level contextual semantics; and 3) clustering pathways into biologically coherent themes by computing pairwise similarities between embeddings. We applied this approach to a case study involving 144 up-regulated Reactome pathways from pediatric acute myeloid leukemia (AML) samples compared with normal controls in the TARGET dataset. A widely used overlap-based method, EnrichmentMap, yielded clusters with uneven sizes and fragmented groupings. For instance, the largest cluster contained 79 pathways spanning diverse processes, while immune-related pathways formed several separate clusters. In contrast, our LLM-based approach produced more cohesive and interpretable results, with 11 clusters ranging from 5 to 25 pathways. These clusters successfully delineated distinct biological processes, such as platelet activation and coagulation, and chromatin remodeling. Notably, our approach demonstrated the value of contextualizing pathways. For example, the “pentose phosphate pathway” was grouped with iron metabolism and autophagy pathways, emphasizing its role in providing NADPH for iron metabolism, which is relevant to AML. This insight was missed by EnrichmentMap, as this pathway was left unclustered. Additionally, “nuclear events stimulated by ALK signaling in cancer” was grouped with oncogenic signaling, reflecting the role of ALK in nuclear processes that drive survival, transformation, and apoptosis escape. Overall, this pilot study underscores the potential of LLM-generated contextual pathway summaries and embeddings to produce biologically coherent pathway clusters that highlight their collective roles. Our approach represents a novel strategy for enhancing the interpretability of pathway analysis results, particularly in complex disease contexts such as cancer. Citation Format: Yibing Guo, Yanhao Tan, Li-Ju Wang, Chien-Hung Shih, Yu-Chiao Chiu. Context-aware pathway enrichment clustering and interpretation using large language models [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 6313.
Read full abstract