Abstract Advancements in genotyping and sequencing techniques, coupled with the growing magnitude of genome databases, have led to comprehensive investigations into the impact of genetic variants on human diseases, including cancers. It is commonly acknowledged that genetic variants have homogeneous effects on clinical outcomes, i.e., variants exhibit similar effects within a certain group but are distinct from those in other groups. However, such grouping information is typically unknown, and identification of the underlying genetic variation groups is crucial for understanding their functional consequences, facilitating patient management, and improving risk prediction. To resolve this challenge, we introduce a novel statistical framework called Survival-based Clustering of Predictors (SCP) in Cox regression to group genetic variants based on patient survival outcomes. Built upon a penalized Cox regression model, SCP considers different genetic variants, as well as additional patient-specific features if needed, as survival predictors and then searches for homogeneity among the coefficients for individual variants to recover the underlying grouping structure. Focusing on TP53, the most frequently mutated gene in cancer that has resulted in a wide range of functions and clinical outcomes, we apply SCP to group TP53 germline mutations with the age of cancer diagnosis as the time-to-event. Using datasets from four Li-Fraumeni syndrome (LFS) cohorts at MD Anderson, NCI, and DFCI, we obtained 75 recurring TP53 germline mutations from 513 patients and clustered these mutations into high-, medium-, and low-risk groups. Hotspot mutations such as R175H, R248Q, G245S, and R273C are in the high-risk group as supported in many TP53 studies, while several non-hotspot mutations such as R290H, V218G, and G244D also exhibit strong positive effects and are grouped together with hotspot mutations. Overall, mutations in all three risk groups, obtained through applying SCP, are highly consistent with a yeast functional assay-based approach for grouping TP53 mutations, supporting the utility of SCP for survival-based grouping of genetic variants. The contribution of our study is three-fold. Statistically, our method fills a critical gap in the survival analysis literature, offering a novel statistical framework for clustering predictors in Cox regression based on survival outcomes. Clinically, we provide a timely solution to the patient outcome-based grouping of TP53 mutations, facilitating clinical management of TP53 mutation carriers. Biologically, we bring fresh insights into knowledge for the most frequently observed genetic variants in cancer, providing new hypotheses to TP53-related biological research. Citation Format: Xiaoqian Liu, Haoming Shi, Emilie Montellier, Pierre Hainaut, Wenyi Wang. Survival-based grouping of genetic variants: A novel statistical framework with an application to TP53 mutations [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3564.
Read full abstract