Performance analysis of four machine learning algorithms for the accurate prediction of metastatic disease in cutaneous squamous cell carcinoma.

Tom William Andrew,Amy Louise Bowes,Penny Lovat,Aidan Rose,Iakov Bolnykh,Philip Sloan,Balraj Maan,Sabrina Noor Pia Martin,Suhari Arahliya Serendibsha Grace Fernando,Ashvati Nair

doi:10.1200/jco.2023.41.16_suppl.e13579

Abstract

e13579 Background: Cutaneous squamous cell carcinoma (cSCC) are the most common form of metastasising skin cancer. Whilst rates of metastatic cSCC are low, they account for a significant proportion of skin cancer related morbidity and mortality, particularly within elderly cohorts, which poses a significant burden to healthcare services. Established cSCC tumour staging systems perform poorly at predicting metastatic risk. Additionally, we lack clinically validated prognostic biomarkers – highlighting the unmet need for novel risk stratification tools to guide clinical practice and improve outcomes for patients with advanced disease. We aimed to train four recognised machine learning (ML) algorithms on a large clinic-pathological dataset of primary cSCC, with the objective of optimising an ML strategy and developing a reliable and clinically useful risk stratification tool capable of accurately predicting metastatic events following primary cSCC. Methods: A dataset of primary cSCC registrations was derived from the Northern Cancer Registry, UK. This identified 7003 histologically confirmed primary cSCC registered between 2010–2020; providing a minimum of 2 years clinical follow-up. We conducted a retrospective analysis of standardised pathology datasets, recording clinical-pathological features. Primary outcome measure was regional and/or distant metastasis. Four machine learning algorithms, were trained based on these features, including: a Logistic Regression Trainer, a Decision Tree Classifier, a Random Forest Classifier and a fully connected artificial neural network (ANN). The algorithms were optimised on training data using five-fold cross validation. Subgroup analysis was performed using mean Shapley additive explanations (SHAP). Results: Accuracy scoring identified the ANN as the optimal predictor of metastasis (0.94), followed by: Logistic Regression Trainer (0.82), Random Forest Classifier (0.80), and Decision Tree Classifier (0.71). Preliminary subgroup analysis identified immunosuppression as most sensitive risk factor for developing metastatic disease (SHAP = 0.122). Conclusions: Significant heterogeneity in current morbidity and mortality data has limited the capacity of traditional statistical models and tumour staging systems to identify very high-risk cSSC. Our findings demonstrate that ML algorithms can accurately predict metastatic events in cSSC populations. Further development of a model user-interface is necessary to support the development of a useful risk stratification tool to guide clinical practice.

Full Text