Abstract Background: The ImmunogenomiC prOfiling of Non-small cell lung cancer (NSCLC) Project (ICON) represents an ambitious undertaking to comprehensively characterize immuno-genomic diversity in NSCLC across diverse platforms. The depth and breadth of this cohort presented a unique opportunity to develop a specialized method for multi-platform data integration and exploration, which can be broadly applied to forthcoming large-scale patient profiling studies. Such a holistic approach can unlock insights for therapeutic targets, biomarkers, and treatment plans by providing a more complete view of phenomena driving disease pathogenesis and evolution. Purpose: We developed a novel shared nearest neighbors (SNN) approach to create an integrated network of ICON’s multi-platform data and identified collections of closely related measurements within the resulting network tied to noteworthy patient characteristics, including recurrence and oncogenotype. Methods: The ICON dataset is derived from tumor and normal lung tissue samples collected from 150 patients at time of resection as well as blood samples collected then and at intervals during the year following. Tissue samples underwent RNA-sequencing (RNA-seq), whole exome sequencing, T-cell receptor sequencing, multiplex immunofluorescence for immune cells, and reverse phase protein array profiling; flow cytometry for immune cells was performed on tissue and blood samples. From these data, the ICON data network was built using an integrative approach based on the SNN algorithm in which genes were linked on the basis of their shared top correlates in orthogonal datasets. Results: The ICON data network currently includes over 20,000 genes linked by over 500,000 connections derived from correlations between RNA-seq and orthogonal platforms. We captured established associations between cancer-related genes and examined these along with new ones in the network. To do so, we used the InfoMap algorithm to extract more interpretable sub-networks, termed modules, from the ICON data network. Single sample gene set enrichment scores for each module were used in multivariate analysis to highlight modules linked to clinical characteristics of interest. As an example, we found modules significantly tied to disease recurrence. The most notable of these was strongly associated with metabolic pathways, and other modules associated with platelets and ion channels were also identified. The metabolic pathway module is being explored as a prognostic biomarker, underscoring the opportunites enabled by mining the network. Conclusions: Through the framework developed, we identified modules in the ICON data network significantly associated with important patient characteristics like recurrence and oncogenotype. We are validating the gene sets identified as potential biomarkers and are developing an interactive application to facilitate further mining of the network. Taken together, our SNN network-building approach enables the integration and exploration of patient data from diverse platforms. Citation Format: Stephanie T. Schmidt, Neal Akhave, Alexandre Reuben, Tina Cascone, Jianhua Zhang, Jun Li, Junya Fujimoto, Lauren A. Byers, Beatriz Sanchez-Espiridion, Lixia Diao, Jing Wang, Lorenzo Federico, Marie-Andree Forget, Daniel J McGrail, Annikka Weissferdt, Shiaw-Yih Lin, Younghee Lee, Natalie Vokes, Carmen Behrens, Ignacio I. Wistuba, Andrew Futreal, Ara Vaporciyan, Boris Sepesi, John V. Heymach, Chantale Bernatchez, Cara Haymaker, Jianjun Zhang, Christopher A. Bristow, Timothy P. Heffernan, Marcelo V. Negrao, Don L. Gibbons. A shared nearest neighbors approach for integrated, multi-platform networks and its application to the exploration of multiomics data from early-stage non-small cell lung cancers [abstract]. In: Proceedings of the AACR-NCI-EORTC Virtual International Conference on Molecular Targets and Cancer Therapeutics; 2021 Oct 7-10. Philadelphia (PA): AACR; Mol Cancer Ther 2021;20(12 Suppl):Abstract nr P009.
Read full abstract