Abstract 5391: Machine learning of cancer type and tissue of origin from proteomes of 1,277 human tissue samples and 975 cancer cell lines

Zhaoxiang Cai,Zainab Noor,Rebecca C Poulos,Phillip J Robinson,Dylan Xavier,Qing Zhong,Peter G Hains,Roger R Reddel,Adel T Aref,Natasha Lucas,Steven G Williams,Jennifer Koh ,Emma Boys ,Rosemary L Balleine

doi:10.1158/1538-7445.am2023-5391

Abstract

Abstract Introduction Cancer type is determined through tumor morphology, aided by immunohistochemical staining. The development of machine learning (ML) models using histology slides has powered the image-based prediction of the site of origin in cancer of unknown primary (CUP). Here, we used ML on proteomic data to predict cancer types and tissue of origin from a sample cohort consisting of 1,277 human tissue samples spanning 44 cancer types. The training proteome datasets included two independent sets of proteomes acquired from a pan-cancer cell line collection and a subset of the tissue cohort for online ML. Methods All samples were processed using data-independent acquisition mass spectrometry (DIA-MS). Two proteomic profiles from the pan-cancer cell line cohort were generated using two independent sample preparation methods. These were normalized by Combat and merged by averaging the protein abundance, yielding a single training set (D1) with 975 cell lines and 9,688 proteins. Similary, 1,277 tissue samples were processed by DIA-MS, quantifying 9,501 proteins. Celligner was used to align the cell lines (D1) with the tissue cohort. Half of the tissue proteomes were used as a second training set (D2) for online ML and a hold-out test set was constructed by taking the other half of the tissue cohort (T1). A multinomial logistic regression was used to predict cancer and tissue types. Top-k accuracy, as the evaluation metric, computes how often the correct cancer and tissue type class is among the top k classes predicted. Results As a proof of concept, we defined six cancer types (adenocarcinoma, sarcoma, squamous carcinoma, lymphoma, melanoma and small cell carcinoma) and seven adenocarcinoma tissues of origin (breast, colorectal, liver, lung, ovary, stomach/esophagus and pancreas) for an ML experiment. We learned a classifier using the cell lines (D1) as the baseline training set, and consecutively added 10% of D2 to D1 for online ML. We tested the baseline model and each subsequent new model on the test set T1. We observed a monotonic performance increase from 0.89 (baseline; Top-1 accuracy) to 0.97 (all D2 were used) when predicting the six cancer types. We observed an analogous trend when predicting the seven tissue types (from 0.64 to 0.84). These results suggest that cancer cell lines can be used to predict cancer type and adenocarcinoma tissue of origin. Conclusion Our proteomic-based ML model can predict cancer type and adenocarcinoma tissue of origin in concordance with existing histopathological classification. It can also assign multiple probabilities to tumor type and tissue of origin, potentially enabling the classification of CUP in future work. By adding tissue samples stepwise to the existing model, its predictive performance can be further enhanced. This reflects a real-world knowledgebase that will continue to increase in predictive power as additional data are added. Citation Format: Zhaoxiang Cai, Zainab Noor, Adel T. Aref, Emma L. Boys, Dylan Xavier, Natasha Lucas, Steven G. Williams, Jennifer M. Koh, Rebecca C. Poulos, Peter G. Hains, Phillip J. Robinson, Rosemary Balleine, Roger R. Reddel, Qing Zhong. Machine learning of cancer type and tissue of origin from proteomes of 1,277 human tissue samples and 975 cancer cell lines. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5391.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract 5391: Machine learning of cancer type and tissue of origin from proteomes of 1,277 human tissue samples and 975 cancer cell lines

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Similar Papers

Pan-cancer diagnostic proteomic signature of tissue-of-origin (TOO) using data-independent acquisition mass spectrometry (DIA-MS) from 1289 human tissue samples.
Emma Boys ... Karen L Mackenzie
Journal of Clinical Oncology | VOL. 41
Emma Boys, et. al.Emma Boys ... Karen L Mackenzie
01 Jun 2023
Journal of Clinical Oncology | VOL. 41

Utility of liquid biopsy for predicting cancer type and informing treatment of carcinoma with unknown primary.
Hua Bao ... Xue Wu
Journal of Clinical Oncology | VOL. 42
Hua Bao, et. al.Hua Bao ... Xue Wu
01 Jun 2024
Journal of Clinical Oncology | VOL. 42

A comparison of DNA sequencing and gene expression profiling to assist tissue of origin diagnosis in cancer of unknown primary.
Atara Posner ...
The Journal of Pathology | VOL. 259
Atara Posner, et. al.Atara Posner ...
30 Nov 2022
The Journal of Pathology | VOL. 259

Abstract PO-058: Decoding tissue of origin patterns by tumor DNA and plasma tumor proteins
Shuaipeng Geng ... Mengna Zhang
Clinical Cancer Research | VOL. 27
Shuaipeng Geng, et. al.Shuaipeng Geng ... Mengna Zhang
01 Mar 2021
Clinical Cancer Research | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract 5391: Machine learning of cancer type and tissue of origin from proteomes of 1,277 human tissue samples and 975 cancer cell lines

Abstract

Talk to us

Similar Papers

More From: Cancer Research