Lung cancer significantly contributes to global cancer mortality, posing challenges in clinical management. Early detection and accurate prognosis are crucial for improving patient outcomes. This study develops an interpretable stacking ensemble model (SEM) for lung cancer prognosis prediction and identifies key risk factors. Using a Kaggle dataset of 1000 patients with 22 variables, the model classifies prognosis into Low, Medium, and High-risk categories. The bootstrap method was employed for evaluation metrics, while SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) assessed model interpretability. Results showed SEM's superior interpretability over traditional models, such as Random Forest, Logistic Regression, Decision Tree, Gradient Boosting Machine, Extreme Gradient Boosting Machine, and Light Gradient Boosting Machine. SEM achieved an accuracy of 98.90 %, precision of 98.70 %, F1 score of 98.85 %, sensitivity of 98.77 %, specificity of 95.45 %, Cohen’s kappa value of 94.56 %, and an AUC of 98.10 %. The SEM demonstrated robust performance in lung cancer prognosis, revealing chronic lung cancer and genetic risk as major factors.
Read full abstract