9560 Background: Strategies to predict progression to metastasis in early-stage melanoma patients have relied on a limited sample size and a limited set of clinical or genomic features. Prior studies were able to achieve good discrimination in small cohorts, but applying advanced machine learning techniques to large datasets with deep clinical and molecular data may yield tools with enhanced generalizability and clinical utility. Methods: We employed a machine learning approach to predict the likelihood of progression to metastatic melanoma for a cohort with an initial diagnosis of stage 0-3 (n=7477) using both structured and human-curated information in the ConcertAI Patient360 melanoma EHR dataset. Patients with uveal melanoma, a second primary malignancy, or clinical trial participation status were excluded. A total of 68 features including staging, demographic, testing, biomarker, and clinical tumor information recorded within 30 days of initial melanoma diagnosis were used to train several machine learning frameworks to predict the likelihood of progression to metastatic melanoma. A logistic regression, random forest classifier, gradient boosting decision tree, and XgBoost framework were compared using the AUC from a 20% hold-out set to determine the optimal framework after hyperparameter tuning. Additional evaluation metrics, which include accuracy, precision, recall, and F1 were computed for the final model. Feature importance measures were determined using Shapley Additive Explanation (SHAP) dependence plots. Permutation (N=1000) was utilized to evaluate the predictive power of the final model. Results: An XgBoost approach produced a test AUC of 0.708 with a pseudo-p value = 0.001 from permutation. Notably, the model produced a precision of 0.709 on the hold-out set. SHAP dependence measures showed that the most important features used for predictions include those involving initial staging and clinical measures of the tumor. Specifically, lower initial stage corresponded with lower predicted probability of metastatic progression. Similarly, higher values of mitotic rate and tumor thickness corresponded with higher predicted probability of progression. In addition, more complex interactions between features also contributed to the improved performance of the XGBoost framework. Conclusions: An XgBoost framework trained on a large set clinical features for 7477 melanoma patients predicted metastatic progression with significant predictive power (p = 0.001) yielding an AUC of 0.708. The model relied heavily on staging information at initial diagnosis and information on tumor size, mitotic rate, and ulceration status to make predictions, which were typically reported in unstructured EMR. These results indicate the clinical utility for machine learning models trained on real world data for both providers and patients.
Read full abstract