Early detection of squamous cell carcinoma of the oral tongue using multidimensional plasma protein analysis and interpretable machine learning

Xiaolian Gu,Amir Salehi,Karin Nylander,Nicola Sgaramella,Lixiao Wang,Philip J Coates

doi:10.1111/jop.13461

Abstract

AbstractBackgroundInterpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention.MethodsData from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub‐group later developed squamous cell carcinoma of the oral tongue (SCCOT), were analyzed. Samples from people who developed SCCOT within less than 5 years were classified as tumor‐to‐be and all other samples as tumor‐free. The optimal ML algorithm for feature selection was identified and feature importance computed by the SHapley Additive exPlanations (SHAP) method. Five popular ML algorithms (AdaBoost, Artificial neural networks [ANNs], Decision Tree [DT], eXtreme Gradient Boosting [XGBoost], and Support Vector Machine [SVM]) were applied to establish prediction models, and decisions of the optimal models were interpreted by SHAP.ResultsUsing the 22 selected features, the SVM prediction model showed the best performance (sensitivity = 0.867, specificity = 0.859, balanced accuracy = 0.863, area under the receiver operating characteristic curve [ROC‐AUC] = 0.924). SHAP analysis revealed that the 22 features rendered varying person‐specific impacts on model decision and the top three contributors to prediction were Interleukin 10 (IL10), TNF Receptor Associated Factor 2 (TRAF2), and Kallikrein Related Peptidase 12 (KLK12).ConclusionUsing multidimensional plasma protein analysis and interpretable ML, we outline a systematic approach for early detection of SCCOT before the appearance of clinical signs.

Full Text