Abstract

3120 Background: Accurate tumour classification based upon tissue of origin (TOO) remains important to guide treatment selection and prognosis but can be challenging in patients with poorly differentiated malignancy, cancer of unknown primary (CUP) or those with prior malignancy. Data-independent acquisition mass spectrometry (DIA-MS)-based proteomics is emerging as a potential clinical diagnostic and prognostic tool. We aimed to develop a protein-based signature to identify histological subtype and adenocarcinoma TOO using DIA-MS data obtained from a pan-cancer study of human tissue samples as an adjunct to histopathological assessment in challenging clinical scenarios. Methods: We performed DIA-MS-based proteomic profiling of 795 fresh frozen tumour and 494 tumour-adjacent normal samples from the Victorian Cancer Biobank in a clinically orientated workflow. We filtered the cohort to include tumour types relevant to CUP. Protein quantification was derived from raw peptide intensity data. Random forest classifiers to identify histological subtype and adenocarcinoma TOO were subsequently trained and tested using 70% and 30% of the data respectively. Evaluation metrics included top-k accuracy (predicting how often the correct class is among the top k predicted classes) and area under the receiver operating curve (AUROC) (one-versus-rest) ± 95% confidence interval. Results: The final tumour cohort consisted of 427 tumour samples representing eight histological subtypes (adenocarcinoma, germ cell tumour, lymphoma, melanoma, renal cell carcinoma, sarcoma, squamous cell carcinoma, thyroid carcinoma) and seven adenocarcinoma TOO (breast, colorectal, liver, lung, ovary, pancreas, prostate). From 9,051 quantified proteins, 83 were identified with potential utility at identifying histological subtype and adenocarcinoma TOO for use in machine learning models. The histological subtype model identified cancer subtype in the test set with top-1 and top-2 accuracy of 0.95 ± 0.03 and 0.98 ± 0.02 respectively. Average test AUROC over all cancer types (n=8) was 0.98 ± 0.02. The adenocarcinoma TOO model identified tumour TOO in the test set with top-1 and top 2- accuracy of 0.88 ± 0.07 and 0.95 ± 0.04 respectively. Average test AUROC over all adenocarcinoma TOO (n=7) was 0.97 ± 0.02. Conclusions: Our clinically orientated DIA-MS-based proteomic workflow and supervised machine learning can identify protein signatures that classify histological subtype and TOO in tumour samples with high accuracy. This technology may assist diagnostic classification of cancer in challenging clinical scenarios, such as CUP.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call