Abstract

This study aimed to develop a risk-prediction model for second primary skin cancer (SPSC) survivors. We identified the clinical characteristics of SPSC and created awareness for physicians screening high-risk patients among skin cancer survivors. Using data from the 1248 skin cancer survivors extracted from five cancer registries, we benchmarked a random forest algorithm against MLP, C4.5, AdaBoost, and bagging algorithms for several metrics. Additionally, in this study, we leveraged the synthetic minority over-sampling technique (SMOTE) for the issue of the imbalanced dataset, cost-sensitive learning for risk assessment, and SHAP for the analysis of feature importance. The proposed random forest outperformed the other models, with an accuracy of 90.2%, a recall rate of 95.2%, a precision rate of 86.6%, and an F1 value of 90.7% in the SPSC category based on 10-fold cross-validation on a balanced dataset. Our results suggest that the four features, i.e., age, stage, gender, and involvement of regional lymph nodes, which significantly affect the output of the prediction model, need to be considered in the analysis of the next causal effect. In addition to causal analysis of specific primary sites, these clinical features allow further investigation of secondary cancers among skin cancer survivors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.