Predicting bike-sharing demand using light gradient boosting machine and Shapley additive explanations values: a cross-regional generalisability study
The promotion of sustainable travel methods, such as public transportation, walking and bike-sharing, is being carried out in many countries around the world to raise awareness of the harmful effects of motorised traffic on the environment and form sustainable travel habits. Bike-sharing is considered a valuable option as it contributes to emission-reduction goals. This study investigates the transferability of a unified light gradient boosting machine (LightGBM) framework for bike-sharing demand prediction across three distinct socio-economic and climatic urban archetypes, namely Seoul, Washington D.C. and London, using variables including temperature, humidity, wind speed, season, hour, working day or holiday and location. While previous research focuses on localised models, this study tests the hypothesis that a single, high-fidelity model can transcend geographical heterogeneity. The results, validated through ten-fold cross-validation to ensure robustness, show that the predictive LightGBM model has a coefficient of determination, R2, of 0.947, root mean square error of 195.532 and mean absolute error of 107.548. Shapley additive explanations interpretability reveals that while temporal cycles and thermal comfort are universal predictors, the location feature captures latent socio-technical maturity, where London exhibits significantly higher peak-hour demand intensity compared to Washington D.C. and Seoul.
- Research Article
- 10.3390/rs18010040
- Dec 23, 2025
- Remote Sensing
The leaf area index (LAI) serves as a critical parameter for assessing wetland ecosystem functions, and accurate LAI retrieval holds substantial significance for wetland conservation and ecological monitoring. To address the spatial constraints of traditional ground-based measurements and the limited accuracy of single-source remote sensing data, this study utilized unmanned aerial vehicle (UAV)-borne hyperspectral and LiDAR sensors to acquire high-quality multi-source remote sensing data of coastal wetlands in the Yellow River Delta. Three machine learning algorithms—random forest (RF), Extreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost)—were employed for LAI retrieval modeling. A total of 38 vegetation indices (VIs) and 12-point cloud features (PCFs) were extracted from hyperspectral imagery and LiDAR point cloud data, respectively. Pearson correlation analysis and the Shapley Additive Explanations (SHAP) method were integrated to identify and select the most informative VIs and PCFs. The performance of LAI retrieval models built on single-source features (VIs or PCFs) or multi-source feature fusion was evaluated using the coefficient of determination (R2) and root mean square error (RMSE). The main findings are as follows: (1) Multi-source feature fusion significantly improved LAI retrieval accuracy, with the RF model achieving the highest performance (R2 = 0.968, RMSE = 0.125). (2) LiDAR-derived structural metrics and hyperspectral-derived vegetation indices were identified as critical factors for accurate LAI retrieval. (3) The feature selection method integrating mean absolute SHAP values (|SHAP| values) with Pearson correlation analysis enhanced model robustness. (4) The intertidal zone exhibited pronounced spatial heterogeneity in the vegetation LAI distribution.
- Research Article
- 10.1093/jamiaopen/ooag020
- Feb 1, 2026
- JAMIA open
To assess the interpretability and acceptance of Shapley values for making artificial intelligence/machine learning (AI/ML) tools more transparent, interpretable, and useful to clinicians. Structured assessments were conducted with 30 clinicians (15 providers; 15 nurses; 8 assessments per clinician) to evaluate their ability to understand interventional Shapley Additive exPlanations (SHAP) values, a type of Shapley value that provides individualized variable importance scores and ascertain their perspective on SHAP value utility for the use of an AI/ML sepsis diagnostic. Participants were shown the diagnostic interface for real clinical scenarios with de-identified patient data with and without SHAP values. The primary outcomes were clinician ability to correctly interpret SHAP values and clinician self-reported improvement in their understanding of how the AI/ML algorithm produced its result. Participants correctly interpreted SHAP values in 235 of 240 assessments (98%; CI, 95%-99%) and reported SHAP values improved their understanding of how the algorithm produced its result in every case (240/240; 100%; CI, 99%-100%). Participants were unanimous (30/30) in preferring the interface with SHAP values over the interface without. Clinician participants strongly preferred the device interface with SHAP values, were unanimous in reporting SHAP values improved their understanding of the AI/ML diagnostic, and scored nearly perfectly when asked to interpret SHAP values. These results suggest health care providers value transparency into AI/ML algorithms designed for clinical use, and that Shapley values are a useful approach to providing that transparency, which in turn may improve tool adoption and clinical utility.
- Research Article
12
- 10.1186/s13075-022-02918-3
- Jan 1, 2022
- Arthritis Research & Therapy
BackgroundThe purpose of this study was to stratify patients with rheumatoid arthritis (RA) according to the trend of disease activity by trajectory-based clustering and to identify contributing factors for treatment response to biologic and targeted synthetic disease-modifying anti-rheumatic drugs (DMARDs) according to trajectory groups.MethodsWe analyzed the data from a nationwide RA cohort from the Korean College of Rheumatology Biologics and Targeted Therapy registry. Patients treated with second-line biologic and targeted synthetic DMARDs were included. Trajectory modeling for clustering was used to group the disease activity trend. The contributing factors using the machine learning model of SHAP (SHapley Additive exPlanations) values for each trajectory were investigated.ResultsThe trends in the disease activity of 688 RA patients were clustered into 4 groups: rapid decrease and stable disease activity (group 1, n = 319), rapid decrease followed by an increase (group 2, n = 36), slow and continued decrease (group 3, n = 290), and no decrease in disease activity (group 4, n = 43). SHAP plots indicated that the most important features of group 2 compared to group 1 were the baseline erythrocyte sedimentation rate (ESR), prednisolone dose, and disease activity score with 28-joint assessment (DAS28) (SHAP value 0.308, 0.157, and 0.103, respectively). The most important features of group 3 compared to group 1 were the baseline ESR, DAS28, and estimated glomerular filtration rate (eGFR) (SHAP value 0.175, 0.164, 0.042, respectively). The most important features of group 4 compared to group 1 were the baseline DAS28, ESR, and blood urea nitrogen (BUN) (SHAP value 0.387, 0.153, 0.144, respectively).ConclusionsThe trajectory-based approach was useful for clustering the treatment response of biologic and targeted synthetic DMARDs in patients with RA. In addition, baseline DAS28, ESR, prednisolone dose, eGFR, and BUN were important contributing factors for 4-year trajectories.
- Research Article
8
- 10.3390/en16207210
- Oct 23, 2023
- Energies
Building electric energy is characterized by a significant increase in its uses (e.g., vehicle charging), a rapidly declining cost of all related data collection, and a proliferation of smart grid concepts, including diverse and flexible electricity pricing schemes. Not surprisingly, an increased number of approaches have been proposed for its modeling and forecasting. In this work, we place our emphasis on three forecasting-related issues. First, we look at the forecasting explainability, that is, the ability to understand and explain to the user what shapes the forecast. To this extent, we rely on concepts and approaches that are inherently explainable, such as the evolutionary approach of genetic programming (GP) and its associated symbolic expressions, as well as the so-called SHAP (SHapley Additive eXplanations) values, which is a well-established model agnostic approach for explainability, especially in terms of feature importance. Second, we investigate the impact of the training timeframe on the forecasting accuracy; this is driven by the realization that fast training would allow for faster deployment of forecasting in real-life solutions. And third, we explore the concept of counterfactual analysis on actionable features, that is, features that the user can really act upon and which therefore present an inherent advantage when it comes to decision support. We have found that SHAP values can provide important insights into the model explainability. In our analysis, GP models demonstrated superior performance compared to neural network-based models (with a 20–30% reduction in Root Mean Square Error (RMSE)) and time series models (with a 20–40% lower RMSE), but a rather questionable potential to produce crisp and insightful symbolic expressions, allowing a better insight into the model performance. We have also found and reported here on an important potential, especially for practical, decision support, of counterfactuals built on actionable features, and short training timeframes.
- Research Article
1
- 10.1200/jco.2023.41.16_suppl.e13539
- Jun 1, 2023
- Journal of Clinical Oncology
e13539 Background: In The US Oncology Network (The Network), about one-third of new patients with a cancer diagnosis started intravenous (IV) treatment after their first visit. The rest of the patients either came in for a consult only or might have received other treatments such as radiation, surgery, or oral therapy. We developed a machine learning model to predict IV treatment initiation among new patients and discovered features associated with the patient’s decision. This model could suggest interventions to improve patient’s access to care. Methods: A retrospective cohort was formed by identifying new patients with cancer from 27 practices in The Network between July 1, 2021 and June 30, 2022. Structured data were extracted and processed from the electronic health records, claims, physician referrals, and the American Community Survey. Patient characteristics included demographics, clinical information, payor types, and socioeconomic status. The referral pattern and the geographic region of practices, and the provider workload were considered as well. Gradient-boosted decision trees, random forest, neural network, and logistic regression models were developed to predict the probability of starting IV treatment within 90 days of the first visit. Model performance was evaluated based on the area under the receiver operating characteristic (AUROC) curve using cross-valuation and 4:1 training/validation random split. Shapley Additive Explanations (SHAP) values were applied to the model to explain feature importance. Results: A total of 117,340 new patients with a cancer diagnosis were included in the study, of whom 35% initiated IV treatment within 90 days of the first visit. A gradient-boosted decision tree algorithm with control of the imbalanced label was chosen as the final model because of the performance and the ability to handle missing values. The model achieved an AUROC of 0.80 on the validation dataset with both cross-valuation and 4:1 training/validation random split. Based on the SHAP values (log odds), we found that clinical information including diagnosis and stage is the most important feature to predict the initiation of IV treatment (mean absolute SHAP = 0.31 and 1.03, respectively). Medicaid contributes least to treatment initiation among all insurance types (mean absolute SHAP = 0.01). In addition, younger age and male patients have a higher chance to start IV treatment (Pearson correlation = -0.41, p-value < 0.01 for age versus SHAP values; p-value < 0.01, two-sided T-test for SHAP values by gender). Conclusions: This study reports a machine learning model to predict IV treatment initiation among new patients with cancer. Clinical features impact the treatment decision more than others. This model could guide patient service and direct personalized care navigation. Further, the model sheds light on future interventions that could enhance patient access to treatment promptly.
- Research Article
3
- 10.1186/s12888-024-06074-7
- Oct 5, 2024
- BMC Psychiatry
BackgroundA better understanding of the relationships between insomnia and anxiety, mood, eating, and alcohol-use disorders is needed given its prevalence among young adults. Supervised machine learning provides the ability to evaluate which mental disorder is most associated with heightened insomnia among U.S. college students. Combined with Bayesian network analysis, probable directional relationships between insomnia and interacting symptoms may be illuminated.MethodsThe current exploratory analyses utilized a national sample of college students across 26 U.S. colleges and universities collected during population-level screening before entering a randomized controlled trial. We used a 4-step statistical approach: (1) at the disorder level, an elastic net regularization model examined the relative importance of the association between insomnia and 7 mental disorders (major depressive disorder, generalized anxiety disorder, social anxiety disorder, panic disorder, post-traumatic stress disorder, anorexia nervosa, and alcohol use disorder); (2) This model was evaluated within a hold-out sample. (3) at the symptom level, a completed partially directed acyclic graph (CPDAG) was computed via a Bayesian hill-climbing algorithm to estimate potential directionality among insomnia and its most associated disorder [based on SHAP (SHapley Additive exPlanations) values)]; (4) the CPDAG was then tested for generalizability by assessing (in)equality within a hold-out sample using structural hamming distance (SHD).ResultsOf 31,285 participants, 20,597 were women (65.8%); mean (standard deviation) age was 22.96 (4.52) years. The elastic net model demonstrated clinical significance in predicting insomnia severity in the training sample [R2 = .44 (.01); RMSE = 5.00 (0.08)], with comparable performance in the hold-out sample (R2 = .33; RMSE = 5.47). SHAP values indicated that the presence of any mental disorder was associated with higher insomnia scores, with major depressive disorder as the most important disorder associated with heightened insomnia (mean |SHAP|= 3.18). The training CPDAG and hold-out CPDAG (SHD = 7) suggested depression symptoms presupposed insomnia with depressed mood, fatigue, and self-esteem as key parent nodes.ConclusionThese findings provide insights into the associations between insomnia and mental disorders among college students and warrant further investigation into the potential direction of causality between insomnia and depression.Trial registrationTrial was registered on the National Institute of Health RePORTER website (R01MH115128 || 23/08/2018).
- Research Article
80
- 10.1016/j.compag.2024.108627
- Jan 13, 2024
- Computers and Electronics in Agriculture
SHAP values accurately explain the difference in modeling accuracy of convolution neural network between soil full-spectrum and feature-spectrum
- Research Article
2
- 10.3389/fphy.2023.1217275
- Aug 10, 2023
- Frontiers in Physics
Backgroundand objectives: Implementation of patient-specific quality assurance (PSQA) is a crucial aspect of precise radiotherapy. Various machine learning-based models have showed potential as virtual quality assurance tools, being capable of accurately predicting the dose verification results of fixed-beam intensity-modulated radiation therapy (IMRT) or volumetric modulated arc therapy (VMAT) plans, thereby ensuring safe and efficient treatment for patients. However, there has been no research yet that simultaneously integrates different IMRT techniques to predict the gamma pass rate (GPR) and explain the model.Methods: Retrospective analysis of the 3D dosimetric verification results based on measurements with gamma pass rate criteria of 3%/2 mm and 10% dose threshold of 409 pelvic IMRT and VMAT plans was carried out. Radiomics features were extracted from the dose files, from which the XGBoost algorithm based on SHapley Additive exPlanations (SHAP) values was used to select the optimal feature subset as the input for the prediction model. The study employed four different machine learning algorithms, namely, random forest (RF), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM), to construct predictive models. Sensitivity, specificity, F1 score, and AUC value were calculated to evaluate the classification performance of these models. The SHAP values were utilized to perform a related interpretive analysis on the best performing model.Results: The sensitivities and specificities of the RF, AdaBoost, XGBoost, and LightGBM models were 0.96, 0.82, 0.93, and 0.89, and 0.38, 0.54, 0.62, and 0.62, respectively. The F1 scores and area under the curve (AUC) values were 0.86, 0.81, 0.88, and 0.86, and 0.81, 0.77, 0.85, and 0.83, respectively. The explanation of the model output based on SHAP values can provide a reference basis for medical physicists when adjusting the plan, thereby improving the efficiency and quality of treatment plans.Conclusion: It is feasible to use a machine learning method based on radiomics to establish a gamma pass rate classification prediction model for IMRT and VMAT plans in the pelvis. The XGBoost model performs better in classification than the other three tree-based ensemble models, and global explanations and single-sample explanations of the model output through SHAP values may offer reference for medical physicists to provide high-quality plans, promoting the clinical application and implementation of GPR prediction models, and providing safe and efficient personalized QA management for patients.
- Research Article
1
- 10.3390/en19010124
- Dec 25, 2025
- Energies
Traditional single-prediction models often exhibit limitations in meeting wind power prediction requirements in complex operational scenarios. Furthermore, the inherent “black-box” nature of deep learning models leads to limited interpretability of predictions, hindering effective support for grid dispatch planning. To address these issues, this study proposes a novel day-ahead wind power prediction method, referred to as SHapley Additive exPlanations (SHAP)–Mixture of Experts (MoE), which integrates SHAP into an MoE framework. Here, SHAP is employed for interpretability purposes. This study innovatively transforms SHAP analysis into prior knowledge to guide the decision-making of the MoE gating network and proposes a two-layer dynamic interpretation mechanism based on the collaborative analysis of gating weights and SHAP values. This approach clarifies key meteorological factors and the model’s advantageous scenarios, while quantifying the uncertainty among multiple expert decisions. Firstly, each expert model was pre-trained, and its parameters were frozen to construct a candidate expert pool. Secondly, the SHAP vectors for each pre-trained expert were computed over all sample features to characterize their decision-making logic under varying scenarios. Thirdly, an augmented feature set was constructed by fusing the original meteorological features with SHAP attribution matrices from all experts; this set was used to train the gating network within the MoE framework. Finally, for new input samples, each frozen expert model generates a prediction along with its corresponding SHAP vector, and the gating network aggregates these predictions to produce the final forecast. The proposed method was validated using operational data from an offshore wind farm located in southeastern China. Compared with the best individual expert model and traditional ensemble forecasting models, the proposed method reduces the Root Mean Square Error (RMSE) by 0.23% to 4.92%. Furthermore, the method elucidates the influence of key features on each expert’s decisions, offering insights into how the gating network adaptively selects experts based on the input features and expert-specific characteristics across different scenarios.
- Research Article
40
- 10.1001/jamapsychiatry.2022.4634
- Jan 18, 2023
- JAMA Psychiatry
The months after psychiatric hospital discharge are a time of high risk for suicide. Intensive postdischarge case management, although potentially effective in suicide prevention, is likely to be cost-effective only if targeted at high-risk patients. A previously developed machine learning (ML) model showed that postdischarge suicides can be predicted from electronic health records and geospatial data, but it is unknown if prediction could be improved by adding additional information. To determine whether model prediction could be improved by adding information extracted from clinical notes and public records. Models were trained to predict suicides in the 12 months after Veterans Health Administration (VHA) short-term (less than 365 days) psychiatric hospitalizations between the beginning of 2010 and September 1, 2012 (299 050 hospitalizations, with 916 hospitalizations followed within 12 months by suicides) and tested in the hospitalizations from September 2, 2012, to December 31, 2013 (149 738 hospitalizations, with 393 hospitalizations followed within 12 months by suicides). Validation focused on net benefit across a range of plausible decision thresholds. Predictor importance was assessed with Shapley additive explanations (SHAP) values. Data were analyzed from January to August 2022. Suicides were defined by the National Death Index. Base model predictors included VHA electronic health records and patient residential data. The expanded predictors came from natural language processing (NLP) of clinical notes and a social determinants of health (SDOH) public records database. The model included 448 788 unique hospitalizations. Net benefit over risk horizons between 3 and 12 months was generally highest for the model that included both NLP and SDOH predictors (area under the receiver operating characteristic curve range, 0.747-0.780; area under the precision recall curve relative to the suicide rate range, 3.87-5.75). NLP and SDOH predictors also had the highest predictor class-level SHAP values (proportional SHAP = 64.0% and 49.3%, respectively), although the single highest positive variable-level SHAP value was for a count of medications classified by the US Food and Drug Administration as increasing suicide risk prescribed the year before hospitalization (proportional SHAP = 15.0%). In this study, clinical notes and public records were found to improve ML model prediction of suicide after psychiatric hospitalization. The model had positive net benefit over 3-month to 12-month risk horizons for plausible decision thresholds. Although caution is needed in inferring causality based on predictor importance, several key predictors have potential intervention implications that should be investigated in future studies.
- Research Article
434
- 10.1186/s40537-024-00905-w
- Mar 26, 2024
- Journal of Big Data
In the context of high-dimensional credit card fraud data, researchers and practitioners commonly utilize feature selection techniques to enhance the performance of fraud detection models. This study presents a comparison in model performance using the most important features selected by SHAP (SHapley Additive exPlanations) values and the model’s built-in feature importance list. Both methods rank features and choose the most significant ones for model assessment. To evaluate the effectiveness of these feature selection techniques, classification models are built using five classifiers: XGBoost, Decision Tree, CatBoost, Extremely Randomized Trees, and Random Forest. The Area under the Precision-Recall Curve (AUPRC) serves as the evaluation metric. All experiments are executed on the Kaggle Credit Card Fraud Detection Dataset. The experimental outcomes and statistical tests indicate that feature selection methods based on importance values outperform those based on SHAP values across classifiers and various feature subset sizes. For models trained on larger datasets, it is recommended to use the model’s built-in feature importance list as the primary feature selection method over SHAP. This suggestion is based on the rationale that computing SHAP feature importance is a distinct activity, while models naturally provide built-in feature importance as part of the training process, requiring no additional effort. Consequently, opting for the model’s built-in feature importance list can offer a more efficient and practical approach for larger datasets and more intricate models.
- Research Article
- 10.1164/ajrccm.2025.211.abstracts.a7167
- May 1, 2025
- American Journal of Respiratory and Critical Care Medicine
Background: Acute Respiratory Distress Syndrome (ARDS) management emphasizes lung-protective ventilation strategies, including limiting driving pressure, positive end-expiratory pressure (PEEP), and tidal-volume. This study applied causal mediation analysis and SHAP (SHapley Additive exPlanations) values to evaluate the relative contributions of driving pressure, PEEP, and tidal volume settings on death60 (mortality at 60 days). Methods: Data were obtained from 1,291 ARDS patients, examining key ventilatory and clinical variables: driving pressure, driving pressure thresholds (&lt;12 cm H₂O, &lt;15 cm H₂O), PEEP, and low tidal volume (&lt;6 mL/kg). Multivariable logistic regression models assessed the direct and indirect effects on death60, adjusted through causal mediation analysis with bootstrapped confidence intervals (500 simulations). Additionally, SHAP values were generated to determine variable importance for death60 outcomes, removing other mortality endpoints (death90) and secondary outcomes (icufd, vfd) for focused analysis. Results: Driving pressure was a significant positive predictor of both death60 (Coefficient = 0.0328, p = 0.0044) and death90 (Coefficient = 0.0331, p = 0.0037), indicating increased mortality with higher driving pressures. Stratified analyses revealed that driving pressures below 12 cm H₂O were not significantly associated with death60 (p = 0.381), though pressures below 15 cm H₂O remained protective (Coefficient = -0.3866, p = 0.0030). SHAP analysis confirmed driving pressure as a primary contributor to death60, followed by PEEP and low tidal volume settings. PEEP was positively associated with mortality (death60: Coefficient = 0.0560, p &lt; 0.0001), but had a protective effect on icufd and vfd. Low tidal volumes &lt;6 mL/kg were independently associated with decreased ICU and ventilator dependency (icufd Coefficient = -4.3527, p &lt; 0.001; vfd Coefficient = -3.7670, p &lt; 0.01).Causal mediation analysis indicated that compliance mediated only a minor proportion of the driving pressure effect on mortality (Average Causal Mediation Effect [ACME] = -0.0016, p = 0.31). Direct effects (ADE) of driving pressure remained significant for death60 (ADE = 0.0058, p = 0.012). Conclusions: Driving pressure and PEEP significantly impact mortality in ARDS patients, with notable threshold effects at driving pressures &lt;15 cm H₂O. Lower tidal volumes offer benefits for ICU and ventilator dependency without increasing mortality risk. Causal mediation analysis underscores that compliance plays a limited mediating role in driving pressure's impact on mortality, suggesting that direct effects predominate. These findings support current lung-protective strategies and offer insights into tailoring ventilatory parameters to reduce ARDS-related mortality and improve ICU outcomes.
- Research Article
14
- 10.3390/electronics13091628
- Apr 24, 2024
- Electronics
The application of Artificial Intelligence (AI) and Machine Learning (ML) models is increasingly leveraged to automate and optimize Data Centre (DC) operations. However, the interpretability and transparency of these complex models pose critical challenges. Hence, this paper explores the Shapley Additive exPlanations (SHAP) values model explainability method for addressing and enhancing the critical interpretability and transparency challenges of predictive maintenance models. This method computes and assigns Shapley values for each feature, then quantifies and assesses their impact on the model’s output. By quantifying the contribution of each feature, SHAP values can assist DC operators in understanding the underlying reasoning behind the model’s output in order to make proactive decisions. As DC operations are dynamically changing, we additionally investigate how SHAP can capture the temporal behaviors of feature importance in the dynamic DC environment over time. We validate our approach with selected predictive models using an actual dataset from a High-Performance Computing (HPC) DC sourced from the Enea CRESCO6 cluster in Italy. The experimental analyses are formalized using summary, waterfall, force, and dependency explanations. We delve into temporal feature importance analysis to capture the features’ impact on model output over time. The results demonstrate that model explainability can improve model transparency and facilitate collaboration between DC operators and AI systems, which can enhance the operational efficiency and reliability of DCs by providing a quantitative assessment of each feature’s impact on the model’s output.
- Research Article
5
- 10.1017/s0033291725000285
- Jan 1, 2025
- Psychological Medicine
BackgroundPatients with schizophrenia experience accelerated aging, accompanied by abnormalities in biomarkers such as shorter telomere length. Brain age prediction using neuroimaging data has gained attention in schizophrenia research, with consistently reported increases in brain-predicted age difference (brain-PAD). However, its associations with clinical symptoms and illness duration remain unclear.MethodsWe developed brain age prediction models using structural magnetic resonance imaging (MRI) data from 10,938 healthy individuals. The models were validated on an independent test dataset comprising 79 healthy controls, 57 patients with recent-onset schizophrenia, and 71 patients with chronic schizophrenia. Group comparisons and the clinical associations of brain-PAD were analyzed using multiple linear regression. SHapley Additive exPlanations (SHAP) values estimated feature contributions to the model, and between-group differences in SHAP values and group-by-SHAP value interactions were also examined.ResultsPatients with recent-onset schizophrenia and chronic schizophrenia exhibited increased brain-PAD values of 1.2 and 0.9 years, respectively. Between-group differences in SHAP values were identified in the right lateral prefrontal area (false discovery rate [FDR] p = 0.022), with group-by-SHAP value interactions observed in the left prefrontal area (FDR p = 0.049). A negative association between brain-PAD and Full-scale Intelligence Quotient scores in chronic schizophrenia was noted, which did not remain significant after correction for multiple comparisons.ConclusionsBrain-PAD increases were pronounced in the early phase of schizophrenia. Regional brain abnormalities contributing to brain-PAD likely vary with illness duration. Future longitudinal studies are required to overcome limitations related to sample size, heterogeneity, and the cross-sectional design of this study.
- Research Article
- 10.3389/fendo.2025.1693166
- Nov 27, 2025
- Frontiers in Endocrinology
BackgroundIschemic heart disease (IHD) and type 2 diabetes mellitus (T2DM) are leading causes of disability-adjusted life years globally among adults aged 55 years and older. Although both diseases share common risk factors and pathophysiological pathways, previous research has predominantly addressed these conditions in isolation. The co-occurrence patterns and regional variations of IHD and T2DM burden remain poorly understood. We aimed to characterize the global co-occurrence patterns of IHD and T2DM from a spatial perspective and to identify the corresponding risk factors distinguishing different burden regions.MethodsUsing data from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 database, we extracted age-standardized disability-adjusted life year (DALY) rates for IHD and T2DM among individuals aged 55 years and older from 204 countries and territories. Based on quartile distributions of global DALY rates for both diseases, we classified countries into four distinct burden regions: Low-Burden Regions (56 countries), T2DM-Dominant Regions (46 countries), IHD-Dominant Regions (46 countries), and Dual-Burden Regions (56 countries). We examined temporal trends from 1990-2021, computed population attributable fractions for major risk factors, and used machine learning-based SHAP (Shapley Additive Explanations) analysis to screen and quantify the effects of corresponding risk factors distinguishing regional classifications.ResultsDual-Burden Regions were distributed across multiple geographic areas including the Caribbean and Central America, Persian Gulf states, Balkan Peninsula, Southeast Asia, West Africa, Eastern Mediterranean, and Northern Europe. The spatial distribution revealed distinct geographic clustering, with higher IHD rates in Eastern Europe and Central Asia, and elevated T2DM rates in Pacific Island nations and parts of the Middle East. Countries and territories with the highest burden for both diseases included North African countries (eg, Morocco: IHD 25,193.1/100,000 and T2DM 32,197.24/100,000) and Pacific Island nations such as Fiji exhibiting IHD burden of 24,758.17 per 100,000 and T2DM burden of 32,197.24 per 100,000. Marshall Islands showed IHD burden of 25,107.72/100,000 and T2DM burden of 22,122.46/100,000, while Nauru demonstrated the highest IHD burden (39,483.92/100,000). High systolic blood pressure contributed most to IHD burden globally (49.79%), while high body-mass index dominated T2DM burden (51.89%). Environmental factors demonstrated clear regional gradients, with household air pollution ranging from 4·58% in Low-Burden to 14.43% in Dual-Burden Regions for IHD. High body-mass index contributed 51.89% to T2DM burden globally, with regional variation from 40.61% in IHD-Dominant to 51.36% in Low-Burden Regions. SHAP analysis identified sociodemographic index (SDI2021) as the primary factor distinguishing Low-Burden from Dual-Burden Regions for both IHD (mean |SHAP| = 1.245) and T2DM (mean |SHAP| = 1.317). Diet high in processed meat consistently showed strong discriminatory power across multiple regional comparisons for T2DM (SHAP values 0.923-1.721), while secondhand smoke emerged as a critical differentiator with SHAP values exceeding 1.0 across various regional distinctions. Diet low in vegetables served as a primary differentiator between Low-Burden and T2DM-Dominant Regions (mean |SHAP| = 1.188).ConclusionThe co-occurrence of IHD and T2DM exhibits pronounced global heterogeneity, with Pacific Island nations and multiple geographic regions including Gulf states, North Africa, and other areas bearing disproportionate dual-burden. Socioeconomic development level fundamentally characterizes dual-burden status, while dietary and environmental factors serve as key regional differentiators. Intervening in modifiable risk factors, particularly processed meat consumption, vegetable intake, and environmental exposures, can fundamentally reduce the global burden of these co-occurring diseases.