Explaining Person-by-Item Responses using Person- and Item-Level Predictors via Random Forests and Interpretable Machine Learning in Explanatory Item Response Models.
This study incorporates a random forest (RF) approach to probe complex interactions and nonlinearity among predictors into an item response model with the goal of using a hybrid approach to outperform either an RF or explanatory item response model (EIRM) only in explaining item responses. In the specified model, called EIRM-RF, predicted values using RF are added as a predictor in EIRM to model the nonlinear and interaction effects of person- and item-level predictors in person-by-item response data, while accounting for random effects over persons and items. The results of the EIRM-RF are probed with interpretable machine learning (ML) methods, including feature importance measures, partial dependence plots, accumulated local effect plots, and the H-statistic. The EIRM-RF and the interpretable methods are illustrated using an empirical data set to explain differences in reading comprehension in digital versus paper mediums, and the results of EIRM-RF are compared with those of EIRM and RF to show empirical differences in modeling the effects of predictors and random effects among EIRM, RF, and EIRM-RF. In addition, simulation studies are conducted to compare model accuracy among the three models and to evaluate the performance of interpretable ML methods.
- Research Article
2
- 10.1080/00015385.2025.2481662
- Apr 7, 2025
- Acta Cardiologica
Background Predicting the prognosis of patients with acute myocardial infarction (AMI) combined with diabetes mellitus (DM) is crucial due to high in-hospital mortality rates. This study aims to develop and validate a mortality risk prediction model for these patients by interpretable machine learning (ML) methods. Methods Data were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.2). Predictors were selected by Least absolute shrinkage and selection operator (LASSO) regression and checked for multicollinearity with Spearman’s correlation. Patients were randomly assigned to training and validation sets in an 8:2 ratio. Seven ML algorithms were used to construct models in the training set. Model performance was evaluated in the validation set using metrics such as area under the curve (AUC) with 95% confidence interval (CI), calibration curves, precision, recall, F1 score, accuracy, negative predictive value (NPV), and positive predictive value (PPV). The significance of differences in predictive performance among models was assessed utilising the permutation test, and 10-fold cross-validation further validated the model’s performance. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were applied to interpret the models. Results The study included 2,828 patients with AMI combined with DM. Nineteen predictors were identified through LASSO regression and Spearman’s correlation. The Random Forest (RF) model was demonstrated the best performance, with an AUC of 0.823 (95% CI: 0.774–0.872), high precision (0.867), accuracy (0.873), and PPV (0.867). The RF model showed significant differences (p < 0.05) compared to the K-Nearest Neighbours and Decision Tree models. Calibration curves indicated that the RF model’s predicted risk aligned well with actual outcomes. 10-fold cross-validation confirmed the superior performance of RF model, with an average AUC of 0.828 (95% CI: 0.800–0.842). Significant Variables in RF model indicated that the top eight significant predictors were urine output, maximum anion gap, maximum urea nitrogen, age, minimum pH, maximum international normalised ratio (INR), mean respiratory rate, and mean systolic blood pressure. Conclusion This study demonstrates the potential of ML methods, particularly the RF model, in predicting in-hospital mortality risk for AMI patients with DM. The SHAP and LIME methods enhance the interpretability of ML models.
- Research Article
1
- 10.1007/s12553-025-00957-9
- Mar 13, 2025
- Health and Technology
PurposeWe wished to investigate whether the risk of acute hospitalizations of chronic heart failure (CHF) patients, could be predicted from biweekly measurements of pulse, blood pressure and weight. We emphasized machine learning models with a high degree of interpretability, due to low adaptation of complex machine learning models in clinical practice.MethodsUsing 11,575 measurements of pulse, blood pressure and weight belonging to 122 patients, we trained three types of machine learning algorithms, logistic regression, Random Forest and “RuleFit” to predict nonelective hospitalization within the next 14 days. We used a fivefold cross validation framework to estimate performance metrics, including f-measure, “Receiver Operating Characteristic—Area Under the Curve” (ROC-AUC), sensitivity and specificity.ResultsA simple interpretable machine learning algorithm, logistic regression with least absolute shrinkage and selection operator (lasso), performed the best. The regression based on simple features performed with a ROC-AUC of 0.622 (sensitivity = 0.185, specificity = 0.93), while the regression based on a more complex feature set performed with a ROC-AUC of 0.657 (sensitivity = 0.212, specificity = 0.921).ConclusionIn our study simple interpretable methods, outperformed more complex black box machine learning methods in predicting hospitalization of heart failure patients. This suggests that interpretable methods are appropriate in this context. However, the strength of results are slightly limited by the overall modest performance of the models and the small sample size.Clinical Trial RegistrationThe original trial was registered at ClinicalTrials.gov, with the identification number NCT02860013 at the 9th of august, 2016.
- Research Article
- 10.18553/jmcp.2025.31.4.406
- Apr 1, 2025
- Journal of managed care & specialty pharmacy
The cost of health care for patients with Hodgkin lymphoma (HL) is projected to rise, making it essential to understand expenditure drivers across different demographics, including the older adult population. Although older HL patients constitute a significant number of HL patients, the literature on health care expenditures in older HL patients is lacking. Predictive capabilities of machine learning (ML) methods enhance our ability to leverage a data-driven approach, which helps identify key predictors of expenditures and strategically plan future expenditures. To determine the leading predictors of health care expenditures among older HL survivors across prediagnosis, treatment, and posttreatment phases of care. The study uses a retrospective research design to identify the incident cases of HL diagnosed between 2009 and 2017 using Surveillance, Epidemiology, and End Results-Medicare data. Three phases of cancer care (prediagnosis, treatment, and posttreatment) were indexed around the diagnosis date, with each phase divided into 12 months of baseline and 12 months of follow-up. ML methods, including XGBoost, Random Forest, and Cross-Validated linear regressions, were used to identify the best regression model for predicting Medicare and out-of-pocket (OOP) health care expenditures. Interpretable ML SHapley Additive exPlanations method was used to identify the leading predictors of Medicare and OOP health care expenditures in each phase. The study analyzed 1,242 patients in the prediagnosis phase, 902 in the treatment phase, and 873 in the posttreatment phase. XGBoost regression outperformed Random Forest and Cross-Validated linear regression models with overall performance in predicting Medicare expenditures, with R-squared (root mean square error) values of 0.42 (1.39), 0.43 (0.56), and 0.46 (0.90) across the 3 phases of care, respectively. Interpretable ML methods highlighted baseline expenditures, number of prescription medications, and cardiac dysrhythmia as the leading predictors for Medicare and OOP expenditures in the prediagnosis phase. Chemotherapy and immunotherapy and surgical treatment and immunotherapy were the leading predictors of expenditures in the treatment and posttreatment phases, respectively. As ML applications increase in predicting health care expenditure, researchers should consider implementing models in different phases of care to identify the changes in the predictors. Leading predictors of health care expenditures can be targeted for informed policy development to address financial hardship in HL survivors.
- Research Article
38
- 10.1016/j.fcr.2022.108640
- Oct 1, 2022
- Field Crops Research
Interpretable machine learning methods to explain on-farm yield variability of high productivity wheat in Northwest India
- Research Article
56
- 10.1371/journal.pone.0284315
- May 4, 2023
- PLOS ONE
Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model’s interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA’s VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies.
- Conference Article
- 10.15396/eres2021_104
- Jan 1, 2021
Machine Learning (ML) can detect complex relationships to solve problems in various research areas. To estimate real estate prices and rents, ML represents a promising extension to the hedonic literature since it is able to increase predictive accuracy and is more flexible than the standard regression-based hedonic approach in handling a variety of quantitative and qualitative inputs. Nevertheless, its inferential capacity is limited due to its complex non-parametric structure and the ‘black box’ nature of its operations. In recent years, research on Interpretable Machine Learning (IML) has emerged that improves the interpretability of ML applications. This paper aims to elucidate the analytical behaviour of ML methods and their predictions of residential rents applying a set of model-agnostic methods. Using a dataset of 58k apartment listings in Frankfurt am Main (Germany), we estimate rent levels with the eXtreme Gradient Boosting Algorithm (XGB). We then apply Permutation Feature Importance (PFI), Partial Dependence Plots (PDP), Individual Conditional Expectation Curve (ICE) and Accumulated Local Effects (ALE). Our results suggest that IML methods can provide valuable insights and yield higher interpretability of ‘black box’ models. According to the results of PFI, most relevant locational variables for apartments are the proximity to bars, convenience stores and bus station hubs. Feature effects show that ML identifies non-linear relationships between rent and proximity variables. Rental prices increase up to a distance of approx. 3 kilometer to a central bus hub, followed by steep decline. We therefore assume tenants to face a trade-off between good infrastructural accessibility and locational separation from the disamenities associated with traffic hubs such as noise and air pollution. The same holds true for proximity to bar with rents peaking at 1 km distance. While tenants appear to appreciate nearby nightlife facilities, immediate proximity is subject to rental discounts. In summary, IML methods can increase transparency of ML models and therefore identify important patterns in rental markets. This may lead to a better understanding of residential real estate and offer new insights for researchers as well as practitioners.
- Research Article
- 10.33425/2639-846x.1065
- Aug 31, 2022
- Anesthesia & Pain Research
Purpose: Topical analgesics have gained acceptance in guidelines for the treatment of pain. The Kailo Pain Patch® is a topically applied analgesic adhesive patch, with a recent study showing reduced pain severity and interference scales in comparison to a control group. However, as with any analgesic modality, treatment response is variable. Advances in technology, such as pharmacogenomic evaluation and machine learning (artificial intelligence) have emerged as tools to assist clinicians with selecting the most suitable treatments for a variety of disease states. There is limited data on the use of these technologies for pain management; only limited studies have applied machine learning to personalize the treatment of chronic pain patients. This report analyzed the PREVENT Study using an existing modified interpretable machine learning method to personalize the selection of the most suitable protocol for use of the Kailo Pain Patch® and other topical analgesics. Patients and methods: Data from the IRB-approved observational PREVENT study were used in the present analysis of 128 (89 females,39 males) chronic pain patients and 20 controls answering the Brief Pain (BPI) questionnaire along with additional questions in the baseline and after 30 days of treatment with the Kailo Pain Patch®. An interpretable machine-learning model was used to build pain outcome prediction models. This method is a multi-objective ensemble classification/regression technique, which combines multi-objective evolutionary algorithms with Support Vector Machines, Random Forests, and feature filtering techniques to optimize the classification model and minimize the utilized feature subset. Three basic endpoints were examined as outputs to the prediction models including Total BPI Severity, Total BPI Interference, and Total medication changes in the follow- up period. Both classification and regression models were constructed for these endpoints and a “leave-one-out” cross-validation strategy was used to evaluate the generalization ability, classification, and regression performance of the deployed models. Results: Experimental results showed that the trained models with the proposed machine learning method were able to predict endpoints with extremely high accuracy, with the AUC exceeding 90% and Spearman correlation metric exceeding 0.4 for all endpoints, overcoming the classification and regression performances of other benchmark models, including the recently introduced XGBoost. The interpretable machine learning method was able to reduce the number of significant features to 15 and was able to identify some of the most important characteristics of responders and non-responders allowing for a personalized approach to creating an individualized pain treatment approach. Applying the trained model in a previous IRB-approved Observational Study (OPERA) dataset (631 chronic pain patients) demonstrated that most of the participants (>70%) who did not benefit from other topical analgesics therapies, as well as more than 50% of responders to OPERA study medications, would have noted improvement from the pain patch studied in PREVENT. Conclusions: Artificial intelligence and machine learning technologies are advancing multiple areas in fields of medicine, including pain management. A model has been developed which continues to be refined; here we show use of that model for predicting response to topical analgesic therapies. We will continue to refine these tools and make them available to front-line clinicians through a user-friendly web interface (https://kailo.insybio.com/) that can be used to support analgesic clinical decision making [15 questions].
- Research Article
92
- 10.1029/2017gc007401
- Apr 1, 2018
- Geochemistry, Geophysics, Geosystems
Geochemically discriminating between magmatism in different tectonic settings remains a fundamental part of understanding the processes of magma generation within the Earth's mantle. Here we present an approach where machine learning (ML) methods are used for quantitative tectonic discrimination and feature selection using global geochemical data sets containing data for volcanic rocks generated in eight different tectonic settings. This study uses support vector machine, random forest, and sparse multinomial regression (SMR) approaches. All these ML methods with data for 24 elements and five isotopic ratios allowed the successful geochemical discrimination between igneous rocks formed in eight different tectonic settings with a discriminant ratio better than 83% for all settings barring oceanic plateaus and back‐arc basins. SMR is a particularly powerful and interpretable ML method because it quantitatively identifies geochemical signatures that characterize the tectonic settings of interest and the characteristics of each sample as a probability of the membership of the sample for each setting. We also present the most representative basalt composition for each tectonic setting. The new data provide reference points for future geochemical discussions. Our results indicate that at least 17 elements and isotopic ratios are required to characterize each tectonic setting, suggesting that geochemical tectonic discrimination cannot be achieved using only a small number of elemental compositions and/or isotopic ratios. The results show that volcanic rocks formed in different tectonic settings have unique geochemical signatures, indicating that both volcanic rock geochemistry and magma generation processes are closely connected to the tectonic setting.
- Research Article
- 10.1002/bimj.70089
- Oct 30, 2025
- Biometrical Journal. Biometrische Zeitschrift
ABSTRACTWith the spread and rapid advancement of black box machine learning (ML) models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability, and fairness in sensitive areas, such as clinical decision‐making processes, the development of targeted therapies, interventions, or in other medical or healthcare‐related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred practitioners from leveraging the full potential of ML for predicting time‐to‐event data. We present a comprehensive review of the existing work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures, or Friedman's H‐interaction statistics can be adapted to survival outcomes. An application of several IML methods to data on breast cancer recurrence in the German Breast Cancer Study Group (GBSG2) serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.
- Research Article
92
- 10.1093/bib/bbad236
- Jul 20, 2023
- Briefings in Bioinformatics
Artificial intelligence (AI) systems utilizing deep neural networks and machine learning (ML) algorithms are widely used for solving critical problems in bioinformatics, biomedical informatics and precision medicine. However, complex ML models that are often perceived as opaque and black-box methods make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. In sensitive areas such as healthcare, explainability and accountability are not only desirable properties but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable AI (XAI) aims to overcome the opaqueness of black-box models and to provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and identify factors that influence their outcomes. However, the majority of the state-of-the-art interpretable ML methods are domain-agnostic and have evolved from fields such as computer vision, automated reasoning or statistics, making direct application to bioinformatics problems challenging without customization and domain adaptation. In this paper, we discuss the importance of explainability and algorithmic transparency in the context of bioinformatics. We provide an overview of model-specific and model-agnostic interpretable ML methods and tools and outline their potential limitations. We discuss how existing interpretable ML methods can be customized and fit to bioinformatics research problems. Further, through case studies in bioimaging, cancer genomics and text mining, we demonstrate how XAI methods can improve transparency and decision fairness. Our review aims at providing valuable insights and serving as a starting point for researchers wanting to enhance explainability and decision transparency while solving bioinformatics problems. GitHub: https://github.com/rezacsedu/XAI-for-bioinformatics.
- Research Article
30
- 10.1016/j.conbuildmat.2023.133553
- Oct 2, 2023
- Construction and Building Materials
Shear strength prediction of FRP-strengthened concrete beams using interpretable machine learning
- Research Article
1
- 10.18502/ijre.v20i1.17622
- Jan 15, 2025
- Iranian Journal of Epidemiology
Background and Objectives: Identifying pregnant women who are at risk of premature birth and determining its risk factors is essential because it affects their health. This study aimed to use an interpretable machine-learning model to predict premature birth. Methods: In this study, data from 149,350 births in Tehran in 2019 were utilized from the Iranian Mothers and Babies Network (IMaN) dataset. Various factors related to the mother and the fetus, such as the mother's demographic variables and health status, medical history, pregnancy conditions, childbirth, and associated risks, were considered. The machine learning models, including multilayer neural networks, random forest, and XGBoost, were employed to predict the occurrence of preterm birth after data preprocessing. The models were evaluated based on accuracy, sensitivity, specificity, and area under the ROC curve. The Python programming language version 3.10.0 was applied to analyze the data. Results: About 8.67% of births were premature. The XGBoost algorithm achieved the highest prediction accuracy (90%). According to the model output, multiple births, which account for 46% of pregnant women's births, had the highest importance score. Delivery risk factors had a score of 41%, and other variables, including neurological and mental illness, preeclampsia, and cardiovascular disease, were subsequently ranked in order of importance for this particular individual. Conclusion: Using an interpretable machine learning method could predict the occurrence of premature birth. Based on risk factors, the interpretable machine learning method can provide personalized preventive recommendations for every pregnant woman, aiming to reduce the risk of preterm birth.
- Research Article
8
- 10.3758/s13428-022-01910-8
- Jul 11, 2022
- Behavior Research Methods
To obtain more accurate and robust feedback information from the students’ assessment outcomes and to communicate it to students and optimize teaching and learning strategies, educational researchers and practitioners must critically reflect on whether the existing methods of data analytics are capable of retrieving the information provided in the database. This study compared and contrasted the prediction performance of an item response theory method, particularly the use of an explanatory item response model (EIRM), and six supervised machine learning (ML) methods for predicting students’ item responses in educational assessments, considering student- and item-related background information. Each of seven prediction methods was evaluated through cross-validation approaches under three prediction scenarios: (a) unrealized responses of new students to existing items, (b) unrealized responses of existing students to new items, and (c) missing responses of existing students to existing items. The results of a simulation study and two real-life assessment data examples showed that employing student- and item-related background information in addition to the item response data substantially increases the prediction accuracy for new students or items. We also found that the EIRM is as competitive as the best performing ML methods in predicting the student performance outcomes for the educational assessment datasets.
- Conference Article
3
- 10.23919/iconac.2019.8895012
- Sep 1, 2019
Traditional vital signs are an essential part of triage assessment in emergency departments (ED), and have been widely used in trauma prediction models. Previous researchers have studied the effect of vital signs scores on predicting traumatic injury outcomes and have found it to be significant. Based on the vital signs’ scores, an Interpretable Machine Learning (IML) method is proposed to predict patient outcomes and is compared with various ML algorithms. Results indicate that the IML method has a comparable performance with a mean AUC of 0.683, and its interpretability would help in the early identification of trauma patients at risk of mortality.
- Research Article
- 10.1680/jbren.24.00056
- Jun 4, 2025
- Proceedings of the Institution of Civil Engineers - Bridge Engineering
Ultra-high-performance concrete (UHPC) bonded to normal concrete (NC) can significantly enhance the mechanical performance of UHPC–NC composite structures, and the interface shear strength is a crucial indicator for assessing the bonding performance. In this study, interpretable machine learning (ML) methods were used to analyse the effects of different parameters on interface shear strength. A database consisting of 305 UHPC–NC shear tests was created, and the isolation forest algorithm was applied to filter outliers. Subsequently, four ML models were trained to predict the interface shear strength of UHPC–NC composite structures. Among them, the extreme gradient boosting (XGBoost) model demonstrated the highest prediction accuracy, achieving an R2 value of 0.95. Shapley additive explanations (SHAP), partial dependence plots (PDP) and individual conditional expectation (ICE) were used for feature importance analysis, aiding in the interpretation of the ‘black box’ nature of the ML models. The results demonstrate that the normal compressive stress at the interface is the most influential factor affecting interfacial shear strength. Finally, a physically meaningful predictive equation for the interface shear strength of UHPC–NC composite structures was proposed based on the XGBoost model combined with curve fitting. This equation enhances the prediction accuracy of interface shear strength for UHPC–NC structures and offers deeper insights into the model’s decision making process.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.