3116 Background: Histopathologic assessment has been the primary modality for the diagnosis of human cancers since the 19th century, and to this day remains the mainstay of diagnosis, risk stratification and staging. While the field has made significant advances, the “art of pathology” relies heavily on subjective visual inspection, with significant levels of inter-observer variability and sometimes uncertainty in diagnosis. Advances in next-generation sequencing have ushered in new molecular diagnostic frameworks that can improve accuracy, critical in deciding treatment. Methods: We utilized machine learning (XGBoost) and developed comprehensive molecular classifiers for cancer site of origin (22 classes i.e. breast, prostate, lung, etc.) and cancer lineage (8 classes i.e. adenocarcinoma, squamous cell carcinoma, etc.) to augment traditional histopathologic assessment. These models were trained on a total of 8,249 tumor samples. We then evaluated performance using a large independent validation cohort consisting of 10,376 samples. While 8,886 of these were primary tumors from 97 different datasets, uniquely we also assessed performance in 1,490 metastatic tumors from 17 datasets. Pathologic diagnosis of metastatic tumors can be more difficult due to de-differentiation, but metastatic samples for molecular profiling are difficult to obtain and were only rarely included in previous efforts. Results: After model training, we locked the site of origin and lineage expression-based models and next evaluated performance on our independent validation cohort. Overall accuracy was 92.5% for cancer site of origin and 97.2% for cancer lineage on validation. Accuracy among primary site samples was higher than for metastatic samples (93.4%, and 86.8% for cancer site of origin, respectively; 97.5% and 95.7% for cancer lineage, respectively). However, accuracy jumped to an astounding 98-99% for both site of origin and lineage in both primary and metastatic samples where the signatures were highly confident, encompassing the majority of cases. Our unique approach in evaluating site of origin and lineage separately allowed us to identify a lineage differentiation score that was associated with small cell lung and neuroendocrine prostate tumors (AUC 0.955 and 0.833, respectively), as well as worse survival across most evaluable tumor types. Conclusions: To our knowledge, this is the largest and most comprehensive validation of platform-independent site of origin/lineage classifiers to date. Our approach can be applied to any existing research or commercial RNA-seq assay, and provides an objective and quantifiable confidence measurement that is correlated with accuracy. This allows for nuanced interpretation of high vs. low confidence predictions, which can complement and potentially even help guide traditional histopathologic assessment in cancer research, clinical trial design, and ultimately clinical practice.
Read full abstract