2037 Background: Central nervous system (CNS) metastasis is a major cause of cancer death and morbidity, but the clinicogenomic covariates of CNS metastasis have been studied in small cohorts. We sought to i) determine whether models predicting patient time to CNS metastasis (ttCNS) trained on a large, automatically annotated clinicogenomic dataset could stratify ttCNS risk in an external, manually curated cohort and ii) use these data to study the genomic risk factors for metastasis at scale. Methods: We leveraged the AACR Project GENIE Biopharma Collaborative (BPC), a structured curation of electronic health records at four cancer centers using the PRISSMM method, to train natural language processing (NLP) algorithms to annotate metastatic sites from radiology reports. We applied these algorithms to all reports for MSK patients with tumor sequencing with our FDA-authorized targeted sequencing platform. We used the resulting clinicogenomic data to train random survival forests (RSF) to predict radiographically confirmed ttCNS from time of sample acquisition for patients with non-small cell lung (NSCLC, N = 7,263), breast (BRC, N = 5,195; HR+ N = 4,050, HER2+ N = 879, triple-negative (TNBC) N = 866), and colorectal cancer (CRC, N = 4,320) using stage, gene-level pathogenic alterations, pre-existing metastatic sites, histopathology, prior and current treatment, and patient demographics as variables, excluding those reaching the endpoint prior to sample acquisition. We also predicted time to bone, liver, and adrenal metastases. RSFs were validated in the manually curated, non-MSK BPC cohort. Results: RSFs had predictive power for ttCNS in validation datasets (NSCLC c-index: 0.66, BRC: 0.71 (HR+ only: 0.71, HER2+ only: 0.69, TNBC: 0.62), CRC: 0.67, all p < 0.001). Pre-existing metastatic involvement, and genomic, histopathologic and clinical features had non-overlapping information for predicting ttCNS. We explored genomic covariates of ttCNS and other sites using Cox proportional hazards models adjusted for disease stage. Within individual cancer types, the hazard ratios of gene-level changes leading to the four considered sites of metastasis were correlated (Pearson R = 0.71-0.98); in all cancer types the highest correlations were between ttCNS and ttAdrenal metastases. Across cancer types, genomic alterations leading to metastatic sites were less correlated (R = -0.22-0.48). For example, CDKN2A/B and MYC alterations shortened ttCNS in NSCLC and HR+ BRC but not in HR- BRC. PTEN was associated with shortened ttCNS in TNBC and NSCLC but not CRC and other breast subtypes. Conclusions: Automatically annotated cohorts provide a means of studying drivers of metastasis at scale. Pre-existing non-CNS sites are associated with shorter ttCNS. Genomic alterations predisposing to CNS metastases frequently predispose to other organ metastases, although in general the genomics of organotropism are highly cancer-specific.
Read full abstract