Interpretable Machine Learning for Survival Analysis

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACTWith the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision‐making processes, the development of targeted therapies, interventions, or in other medical or healthcare‐related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time‐to‐event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H‐interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.

Similar Papers
  • Research Article
  • 10.1017/psy.2025.10032
Explaining Person-by-Item Responses using Person- and Item-Level Predictors via Random Forests and Interpretable Machine Learning in Explanatory Item Response Models.
  • Jul 31, 2025
  • Psychometrika
  • Sun-Joo Cho + 3 more

This study incorporates a random forest (RF) approach to probe complex interactions and nonlinearity among predictors into an item response model with the goal of using a hybrid approach to outperform either an RF or explanatory item response model (EIRM) only in explaining item responses. In the specified model, called EIRM-RF, predicted values using RF are added as a predictor in EIRM to model the nonlinear and interaction effects of person- and item-level predictors in person-by-item response data, while accounting for random effects over persons and items. The results of the EIRM-RF are probed with interpretable machine learning (ML) methods, including feature importance measures, partial dependence plots, accumulated local effect plots, and the H-statistic. The EIRM-RF and the interpretable methods are illustrated using an empirical data set to explain differences in reading comprehension in digital versus paper mediums, and the results of EIRM-RF are compared with those of EIRM and RF to show empirical differences in modeling the effects of predictors and random effects among EIRM, RF, and EIRM-RF. In addition, simulation studies are conducted to compare model accuracy among the three models and to evaluate the performance of interpretable ML methods.

  • Conference Article
  • 10.15396/eres2021_104
Peeking inside the Black Box: Interpretable Machine Learning and Hedonic Rental Estimation
  • Jan 1, 2021
  • Marcelo Cajias + 3 more

Machine Learning (ML) can detect complex relationships to solve problems in various research areas. To estimate real estate prices and rents, ML represents a promising extension to the hedonic literature since it is able to increase predictive accuracy and is more flexible than the standard regression-based hedonic approach in handling a variety of quantitative and qualitative inputs. Nevertheless, its inferential capacity is limited due to its complex non-parametric structure and the ‘black box’ nature of its operations. In recent years, research on Interpretable Machine Learning (IML) has emerged that improves the interpretability of ML applications. This paper aims to elucidate the analytical behaviour of ML methods and their predictions of residential rents applying a set of model-agnostic methods. Using a dataset of 58k apartment listings in Frankfurt am Main (Germany), we estimate rent levels with the eXtreme Gradient Boosting Algorithm (XGB). We then apply Permutation Feature Importance (PFI), Partial Dependence Plots (PDP), Individual Conditional Expectation Curve (ICE) and Accumulated Local Effects (ALE). Our results suggest that IML methods can provide valuable insights and yield higher interpretability of ‘black box’ models. According to the results of PFI, most relevant locational variables for apartments are the proximity to bars, convenience stores and bus station hubs. Feature effects show that ML identifies non-linear relationships between rent and proximity variables. Rental prices increase up to a distance of approx. 3 kilometer to a central bus hub, followed by steep decline. We therefore assume tenants to face a trade-off between good infrastructural accessibility and locational separation from the disamenities associated with traffic hubs such as noise and air pollution. The same holds true for proximity to bar with rents peaking at 1 km distance. While tenants appear to appreciate nearby nightlife facilities, immediate proximity is subject to rental discounts. In summary, IML methods can increase transparency of ML models and therefore identify important patterns in rental markets. This may lead to a better understanding of residential real estate and offer new insights for researchers as well as practitioners.

  • Research Article
  • 10.1680/jbren.24.00056
Prediction of interface shear strength between ultra-high-performance concrete and concrete using machine learning method
  • Jun 4, 2025
  • Proceedings of the Institution of Civil Engineers - Bridge Engineering
  • Yuqing Hu + 4 more

Ultra-high-performance concrete (UHPC) bonded to normal concrete (NC) can significantly enhance the mechanical performance of UHPC–NC composite structures, and the interface shear strength is a crucial indicator for assessing the bonding performance. In this study, interpretable machine learning (ML) methods were used to analyse the effects of different parameters on interface shear strength. A database consisting of 305 UHPC–NC shear tests was created, and the isolation forest algorithm was applied to filter outliers. Subsequently, four ML models were trained to predict the interface shear strength of UHPC–NC composite structures. Among them, the extreme gradient boosting (XGBoost) model demonstrated the highest prediction accuracy, achieving an R2 value of 0.95. Shapley additive explanations (SHAP), partial dependence plots (PDP) and individual conditional expectation (ICE) were used for feature importance analysis, aiding in the interpretation of the ‘black box’ nature of the ML models. The results demonstrate that the normal compressive stress at the interface is the most influential factor affecting interfacial shear strength. Finally, a physically meaningful predictive equation for the interface shear strength of UHPC–NC composite structures was proposed based on the XGBoost model combined with curve fitting. This equation enhances the prediction accuracy of interface shear strength for UHPC–NC structures and offers deeper insights into the model’s decision making process.

  • Conference Article
  • Cite Count Icon 3
  • 10.23919/iconac.2019.8895012
A new machine learning technique for predicting traumatic injuries outcomes based on the vital signs
  • Sep 1, 2019
  • Fatima Almaghrabi + 2 more

Traditional vital signs are an essential part of triage assessment in emergency departments (ED), and have been widely used in trauma prediction models. Previous researchers have studied the effect of vital signs scores on predicting traumatic injury outcomes and have found it to be significant. Based on the vital signs’ scores, an Interpretable Machine Learning (IML) method is proposed to predict patient outcomes and is compared with various ML algorithms. Results indicate that the IML method has a comparable performance with a mean AUC of 0.683, and its interpretability would help in the early identification of trauma patients at risk of mortality.

  • Research Article
  • Cite Count Icon 78
  • 10.1093/bib/bbad236
Explainable AI for Bioinformatics: Methods, Tools and Applications.
  • Jul 20, 2023
  • Briefings in Bioinformatics
  • Md Rezaul Karim + 7 more

Artificial intelligence (AI) systems utilizing deep neural networks and machine learning (ML) algorithms are widely used for solving critical problems in bioinformatics, biomedical informatics and precision medicine. However, complex ML models that are often perceived as opaque and black-box methods make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. In sensitive areas such as healthcare, explainability and accountability are not only desirable properties but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable AI (XAI) aims to overcome the opaqueness of black-box models and to provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and identify factors that influence their outcomes. However, the majority of the state-of-the-art interpretable ML methods are domain-agnostic and have evolved from fields such as computer vision, automated reasoning or statistics, making direct application to bioinformatics problems challenging without customization and domain adaptation. In this paper, we discuss the importance of explainability and algorithmic transparency in the context of bioinformatics. We provide an overview of model-specific and model-agnostic interpretable ML methods and tools and outline their potential limitations. We discuss how existing interpretable ML methods can be customized and fit to bioinformatics research problems. Further, through case studies in bioimaging, cancer genomics and text mining, we demonstrate how XAI methods can improve transparency and decision fairness. Our review aims at providing valuable insights and serving as a starting point for researchers wanting to enhance explainability and decision transparency while solving bioinformatics problems. GitHub: https://github.com/rezacsedu/XAI-for-bioinformatics.

  • Book Chapter
  • Cite Count Icon 311
  • 10.1007/978-3-030-65965-3_28
Interpretable Machine Learning – A Brief History, State-of-the-Art and Challenges
  • Jan 1, 2020
  • Christoph Molnar + 2 more

We present a brief history of the field of interpretable machine learning (IML), give an overview of state-of-the-art interpretation methods, and discuss challenges. Research in IML has boomed in recent years. As young as the field is, it has over 200 years old roots in regression modeling and rule-based machine learning, starting in the 1960s. Recently, many new IML methods have been proposed, many of them model-agnostic, but also interpretation techniques specific to deep learning and tree-based ensembles. IML methods either directly analyze model components, study sensitivity to input perturbations, or analyze local or global surrogate approximations of the ML model. The field approaches a state of readiness and stability, with many methods not only proposed in research, but also implemented in open-source software. But many important challenges remain for IML, such as dealing with dependent features, causal interpretation, and uncertainty estimation, which need to be resolved for its successful application to scientific problems. A further challenge is a missing rigorous definition of interpretability, which is accepted by the community. To address the challenges and advance the field, we urge to recall our roots of interpretable, data-driven modeling in statistics and (rule-based) ML, but also to consider other areas such as sensitivity analysis, causal inference, and the social sciences.

  • Research Article
  • 10.1177/09544070251330416
Effects of IVIS touchscreen operation tasks on driver’s mental workload based on explainable CatBoost algorithm
  • Apr 28, 2025
  • Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering
  • Tianzheng Wei + 4 more

With the widespread adoption of touchscreen-based in-vehicle information systems (IVIS) in vehicles, a large amount of effective information is displayed on the system, resulting in a significant increase in the frequency of driver interaction with these systems. However, complex operational tasks can lead to elevated mental workload, thereby impacting driving safety. This study aims to investigate the effects of secondary tasks involving IVIS touchscreen operations on driver mental workload. Through driving simulation experiments and survey questionnaires, three car-following scenarios were designed at speed levels of 60, 40, and 20 km/h. A total of 36 participants completed the IVIS secondary task driving simulation tests. Using statistical analysis and interpretable machine learning methods, a driver mental workload prediction model based on the CatBoost algorithm was constructed. Shapley Additive exPlanations (SHAP), partial dependence plot (PDP), and individual conditional expectation (ICE) were used to comprehensively analyze the relationship between important driving behavior characteristics and mental workload. The results of the study indicate that as the number of manual operations of IVIS touchscreen secondary tasks increases, the driver’s mental workload, standard deviation of speed, standard deviation of Lateral offset distance, task completion time, and saccade number correspondingly significant increase ( p < 0.05). With the increase of the driver’s mental workload, the speed of the following vehicle decreases ( p < 0.01), along with a significant reduction in following distance ( p < 0.05). When the number of IVIS secondary task manual operations exceeded 3, the probability of a high mental workload significantly increased. These findings provide a basis for designing safer IVIS, contributing to enhanced driving safety and improved driver experience, holding significant theoretical and practical application value.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.cscm.2024.e03840
Estimation of compressive strength of concrete with manufactured sand and natural sand using interpretable artificial intelligence
  • Oct 10, 2024
  • Case Studies in Construction Materials
  • Xiaodong Liu + 3 more

To satisfy the design strength of manufactured sand concrete (MSC) in practical engineering applications, a plethora of geotechnical tests are frequently conducted. An effective approach is imperative to reduce the consumption of labor and resources during these tests. The objective of this paper is to introduce an interpretable machine learning (ML) method to evaluate the compressive strength (CS) of MSC. Firstly, a dataset was established by compiling experimental results from 208 literatures. 3382 data points were selected from the dataset for algorithm training. Recursive Feature Elimination with Cross-Validation (RFECV) was employed to select input parameters. Four algorithms with 12 selected input variables and 1 output variable were constructed to predict CS of MSC using Random Forest (RF), Gradient Boosting Decision Trees (GBDT), eXtreme Gradient Boosting algorithm (XGBoost), and Categorical Boosting (CatBoost). The results show that XGBoost has the highest accuracy and generalization ability (R2=0.934, MAE=3.44, RMSE=5.16, MAPE=0.07). To enhance model transparency, SHapley Additive exPlanations (SHAP) was adopted to explain the underlying predictive mechanisms of ML models. Analyses show that, 1) the cement content, curing time, and water content were the main feature parameters influencing the CS of MSC, 2) the cement content, curing time, and water content have a linear increase, logarithmic increase, and exponential decrease with the CS of MSC, respectively. Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) plots were used to further analyze the impacts of these significant influencing factors on the CS of MSC. Additionally, the Local Interpretable Model-Agnostic Explanations (LIME) method was employed to investigate thresholds for various material dosages in MSC containing 5–10 % stone powder. Two typical scenarios were selected for analysis, yielding recommended dosage ranges for concrete of two distinct strengths. Finally, a graphical user interface (GUI) for the CS of MSC has been designed, which might be of great use to material engineers. This provides reference and guidance for concrete engineering practice.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/ssci47803.2020.9308404
Taxonomy and Survey of Interpretable Machine Learning Method
  • Dec 1, 2020
  • Saikat Das + 4 more

Since traditional machine learning (ML) techniques use black-box model, the internal operation of the classifier is unknown to human. Due to this black-box nature of the ML classifier, the trustworthiness of their predictions is sometimes questionable. Interpretable machine learning (IML) is a way of dissecting the ML classifiers to overcome this shortcoming and provide a more reasoned explanation of model predictions. In this paper, we explore several IML methods and their applications in various domains. Moreover, a detailed survey of IML methods along with identifying the essential building blocks of a black-box model is presented here. Herein, we have identified and described the requirements of IML methods and for completeness, a taxonomy of IML methods which classifies each into distinct groupings or sub-categories, is proposed. The goal, therefore, is to describe the state-of-the-art for IML methods and explain those in more concrete and understandable ways by providing better basis of knowledge for those building blocks and our associated requirements analysis.

  • Research Article
  • Cite Count Icon 28
  • 10.1038/s41592-024-02359-7
Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments.
  • Aug 1, 2024
  • Nature methods
  • Valerie Chen + 5 more

Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers.

  • Research Article
  • 10.18502/ijre.v20i1.17622
Predicting the Occurrence of Preterm Birth and Determining its Risk Factors Individually Using an Interpretable Machine Learning Model
  • Jan 15, 2025
  • Iranian Journal of Epidemiology
  • Ramin Farrokhi + 3 more

Background and Objectives: Identifying pregnant women who are at risk of premature birth and determining its risk factors is essential because it affects their health. This study aimed to use an interpretable machine-learning model to predict premature birth. Methods: In this study, data from 149,350 births in Tehran in 2019 were utilized from the Iranian Mothers and Babies Network (IMaN) dataset. Various factors related to the mother and the fetus, such as the mother's demographic variables and health status, medical history, pregnancy conditions, childbirth, and associated risks, were considered. The machine learning models, including multilayer neural networks, random forest, and XGBoost, were employed to predict the occurrence of preterm birth after data preprocessing. The models were evaluated based on accuracy, sensitivity, specificity, and area under the ROC curve. The Python programming language version 3.10.0 was applied to analyze the data. Results: About 8.67% of births were premature. The XGBoost algorithm achieved the highest prediction accuracy (90%). According to the model output, multiple births, which account for 46% of pregnant women's births, had the highest importance score. Delivery risk factors had a score of 41%, and other variables, including neurological and mental illness, preeclampsia, and cardiovascular disease, were subsequently ranked in order of importance for this particular individual. Conclusion: Using an interpretable machine learning method could predict the occurrence of premature birth. Based on risk factors, the interpretable machine learning method can provide personalized preventive recommendations for every pregnant woman, aiming to reduce the risk of preterm birth.

  • Research Article
  • 10.3390/land14020386
Analysis of the Driving Mechanism of Grassland Degradation in Inner Mongolia Grassland from 2015 to 2020 Using Interpretable Machine Learning Methods
  • Feb 12, 2025
  • Land
  • Zuopei Zhang + 2 more

In traditional studies on grassland degradation drivers, researchers often lacked the flexibility to selectively consider driving factors and quantitatively depict their contributions. Interpretable machine learning offers a solution to these challenges. This study focuses on Inner Mongolia, China, incorporating four categories and sixteen specific driving factors, and employing four machine learning techniques (Logistic Regression, Random Forest, XGBoost, and LightGBM) to investigate regional grassland changes. Using the SHAP approach, contributions of driving factors were quantitatively analyzed. The findings reveal the following: (1) Between 2015 and 2020, Inner Mongolia experienced significant grassland degradation, with an affected area reaching 12.12 thousand square kilometers. (2) Among the machine learning models tested, the LightGBM model exhibited superior prediction accuracy (0.89), capability (0.9), and stability (0.76). (3) Key factors driving grassland changes in Inner Mongolia include variations in rural population, livestock numbers, average temperatures during the growth season, peak temperatures, and proximity to roads. (4) In eastern and western Inner Mongolia, changes in rural population (31.4%) are the primary degradation drivers; in the central region, livestock number changes (41.1%) dominate; and in the southeast, climate changes (19.3%) are paramount. This work exemplifies the robust utility of interpretable machine learning in predicting grassland degradation and offers insights for policymakers and similar ecological regions.

  • Research Article
  • 10.3389/fimmu.2025.1528046
Interpretable machine learning algorithms reveal gut microbiome features associated with atopic dermatitis.
  • May 1, 2025
  • Frontiers in immunology
  • Jingtai Ma + 8 more

The "gut-skin axis" has been proposed to play an important role in the development and symptoms of atopic dermatitis. Therefore, we have constructed an interpretable machine learning framework to quantitatively screen key gut flora. The 16S rRNA dataset, after applying the centered log-ratio transformation, was analyzed using five different machine learning models: random forest, light gradient boosting machine, extreme gradient boosting, support vector machine with radial kernel, and logistic regression. Interpretable machine learning methods, such as SHAP values, were used to identify significant features associated with atopic dermatitis. Random forest performed better than the other "tree" models in the validation partitions. The SHAP global dependency plot indicated that Bifidobacterium ranked as the strongest predictive factor across all prediction horizons, although the SHAP values for some features were still higher in support vector machine and logistic regression models. The SHAP partial dependency plot for "tree" models showed that the best segmentation point for Bifidobacterium was further from the origin compared to other features in the respective models, quantitatively reflecting differences in gut microbiota. Machine learning models combined with SHAP could be used to quantitatively screen key gut flora in atopic dermatitis patients, providing doctors with an intuitive understanding of 16S rRNA sequencing data to support precision medicine in care and recovery.

  • Research Article
  • Cite Count Icon 146
  • 10.1016/j.autcon.2021.103821
An engineer's guide to eXplainable Artificial Intelligence and Interpretable Machine Learning: Navigating causality, forced goodness, and the false perception of inference
  • Jul 2, 2021
  • Automation in Construction
  • M.Z Naser

An engineer's guide to eXplainable Artificial Intelligence and Interpretable Machine Learning: Navigating causality, forced goodness, and the false perception of inference

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3390/jcm13051222
Using an Interpretable Amino Acid-Based Machine Learning Method to Enhance the Diagnosis of Major Depressive Disorder
  • Feb 21, 2024
  • Journal of Clinical Medicine
  • Cyrus Su Hui Ho + 5 more

Background: Major depressive disorder (MDD) is a leading cause of disability worldwide. At present, however, there are no established biomarkers that have been validated for diagnosing and treating MDD. This study sought to assess the diagnostic and predictive potential of the differences in serum amino acid concentration levels between MDD patients and healthy controls (HCs), integrating them into interpretable machine learning models. Methods: In total, 70 MDD patients and 70 HCs matched in age, gender, and ethnicity were recruited for the study. Serum amino acid profiling was conducted by means of chromatography-mass spectrometry. A total of 21 metabolites were analysed, with 17 from a preset amino acid panel and the remaining 4 from a preset kynurenine panel. Logistic regression was applied to differentiate MDD patients from HCs. Results: The best-performing model utilised both feature selection and hyperparameter optimisation and yielded a moderate area under the receiver operating curve (AUC) classification value of 0.76 on the testing data. The top five metabolites identified as potential biomarkers for MDD were 3-hydroxy-kynurenine, valine, kynurenine, glutamic acid, and xanthurenic acid. Conclusions: Our study highlights the potential of using an interpretable machine learning analysis model based on amino acids to aid and increase the diagnostic accuracy of MDD in clinical practice.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon