Abstract The modern development of sequencing technologies provides a comprehensive molecular portrait of human cancers. There is a strong need to develop methods to not only improve patient prognosis predictions but also to understand the driving factors for treatment. However, the high-dimension, low-sample size nature of the genomic data poses challenges for typical machine learning algorithms. The systematic understanding of genes with respect to a network (protein-protein interaction (PPI) network) is a way to handle the limit and the nonparametric analysis of geometric properties such as Ollivier-Ricci curvature and associated invariant measure developed by our group have proven to be successful for the prediction of survival in multiple cancers. In this work, we propose a novel supervised deep learning approach combining the aforementioned geometric methods, which benefit from the flexibility provided by deep learning techniques while still preserving much of the interpretability of the geometric analysis. We take advantage of a state-of-the-art graph neural network approach. Sparse connections between layers were inspired by the known biology of the PPI network from the Human Protein Reference Database (HPRD) and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, supplemented with geometric network features which are fed into the network in corresponding layers. The prediction is based on a local-global principle, where highly predictive features are selected from early layers of the network and fed directly to the final layer to produce a multivariable Cox regression. We applied our method to RNA-Seq gene expression data from the CoMMpass study of multiple myeloma (MM). More specifically, 657 patients in the data set were randomly divided into training, validation and set-aside testing sets by a ratio of 6:2:2. We obtained an average C-index 0.66 of the prediction in the testing set from a 10-fold data split. Dichotomizing the testing set by its mean value to define high-risk vs. low-risk yielded a significant p-value of the log-rank test in the set-aside data (p-value =3e-4). We observed that geometric protein network information not only improved the outcome prediction (vs. 6% worse without geometric feature inputs), but was also more robust to fold splitting. From our model, we identified WEE1, CENPE and CENPF as top genes driving survival differences (higher expression of WEE1 increased risk and lower negative curvature between CENPE and CENPF increased risk). WEE1 is a cell cycle-related gene that regulates DNA repair and CENPE and CENPF are components of a fibrous layer of mitotic kinetochores, which have been indicated in the literature to be related to the prognosis as well as possible targets for treatment. While it is therefore logical that these genes would be implicated in the natural history of MM, they were identified entirely on the basis of network analysis. Citation Format: Jiening Zhu, Jung Hun Oh, Anish K. Simhal, Rena Elkin, Larry Norton, Joseph O. Deasy, Allen R. Tannenbaum. Deep neural networks using protein-protein network information predict multiple myeloma survival. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5367.
Read full abstract