Abstract Ovarian cancer is a challenging disease, often exhibiting recurrence after initial treatment. The complexity arises from simultaneous disruptions in multiple signaling pathways within tumor cells, rendering targeted inhibitors ineffective in providing sustained therapeutic impact. While targeted sequencing panels can identify specific therapeutically relevant events, they often fail to capture the complete picture of targetable tumor driving aberrations, and provide biomarkers relevant to disease resistance and/or recurrence. A potential solution lies in comprehensively defining the landscape of signaling pathway alterations by identifying molecular triggers underlying their dysregulation. While individual DNA mutations sometimes may not lead to carcinogenesis or changes in tumor cell behavior, combination of deactivating mutations in suppressor genes, can influence tumor development and prognosis, warranting a comprehensive delineation of their effect on the broad signaling network level. To this end, we propose the Large Language Models (LLMs), which have been successful in natural language processing, particularly in understanding context. Such tools are essential for discerning the effect of mutation combinations, their physical proximity to other mutations, and functional gene regions. We obtained mutation annotation files for 177 ovarian cancer patients from the TCGA database, reconstructing consensus nucleotide sequences using a reference genome. To input this data into LLMs, we generated embeddings for these sequences using the Nucleotide Transformer algorithm, pre-trained on 500 million variants of the reference genome. Additionally, we extracted data on detected disruptions in 10 signaling pathways (Cell Cycle, HIPPO, MYC, NOTCH, NRF2, PI3K, RTK RAS, TP53, TGF-Beta, WNT) for these patients from TCGA. Each pathway was associated with the percentage of its alteration. We trained a deep learning algorithm on these data to predict a combination of 10 numerical values, each corresponding to the percentage of disruption for a given signaling pathway. Our findings revealed that nearly half of the patients exhibited disruptions in 5 or more signaling pathways. The algorithm developed enables the determination of the extent of disruption in each pathway based on whole exome sequencing results. This approach facilitates more informed treatment strategy planning and enhances the efficient development of new drugs for ovarian cancer treatment. Citation Format: Dmitrii K. Chebanov, Nadezhda S. Tatevosova. Application of large language models to nucleotide sequences for profiling signaling pathway disruptions in ovarian cancer patients [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3522.
Read full abstract