• All Solutions All Solutions Caret
    • Editage

      One platform for all researcher needs

    • Paperpal

      AI-powered academic writing assistant

    • R Discovery

      Your #1 AI companion for literature search

    • Mind the Graph

      AI tool for graphics, illustrations, and artwork

    • Journal finder

      AI-powered journal recommender

    Unlock unlimited use of all AI tools with the Editage Plus membership.

    Explore Editage Plus
  • Support All Solutions Support
    discovery@researcher.life
Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link

Articles published on partial-dependence-plots

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
945 Search results
Sort by
Recency
  • Research Article
  • 10.1097/asw.0000000000000347
A Longitudinal Investigation of Stage 2 Pressure Injury Outcomes With Machine Learning Technique to Identify Relevant Factors.
  • Oct 1, 2025
  • Advances in skin & wound care
  • Jae Hyung Jeon + 2 more

Pressure injuries (PIs) have become a global issue due to the significant social costs associated with various factors. Although many factors have been shown to have an impact on PIs, what specifically contributes to the worsening of the disease remains unclear. The aim of this study was to analyze variables that are highly correlated with PI aggravation using machine learning. This observational study examined 71 Stage 2 PI patients from May 2018 to June 2021. The authors classified patients into 2 groups according to wound progression: (1) group A, aggravated group, and (2) group B, healed group. All 24 factors were analyzed using a Random Forest with hyperensemble approach, one of the machine learning algorithms. Each Random Forest is composed of 50,000 decision trees, and results from 100 Random Forests were hyperensembled. The mean decrease accuracy was calculated to evaluate the importance of the factor, and overlapped partial dependence plots were obtained to interpret the risk factors. Group A had 14 patients, whereas group B had 57. In an analysis using machine learning, the following factors were found to be highly associated with the aggravation of PIs: serum-albumin, Braden Scale, hemoglobin, wound size, serum-blood urea nitrogen, body mass index, serum-protein, and serum-creatinine. But the following variables were less associated: end-stage renal disease, sex, and myocardial infarction. The PIs prediction model has broad application as a PI prevention tool. In addition, these findings can aid in the development of strategies to minimize the risk of PI aggravation.

  • Research Article
  • 10.1016/j.watres.2025.123976
Machine learning-based optimization of enhanced nitrogen removal in a full-scale urban wastewater treatment plant with ecological combination ponds.
  • Oct 1, 2025
  • Water research
  • Jinhu Yun + 8 more

Machine learning-based optimization of enhanced nitrogen removal in a full-scale urban wastewater treatment plant with ecological combination ponds.

  • Research Article
  • 10.2196/73840
Prediction of Moderate-to-Severe Sepsis-Associated Acute Kidney Injury Using a Dual-Timepoint Machine Learning Model: Development, Multiregional Validation, and Clinical Deployment Study
  • Sep 30, 2025
  • Journal of Medical Internet Research
  • Xinbo Ge + 9 more

BackgroundSepsis-associated acute kidney injury (SA-AKI) is a frequent and life-threatening complication in patients in the intensive care unit (ICU), significantly increasing both mortality rates and the risk of chronic kidney dysfunction. However, existing prediction models have often focused on overall risk and lack severity-based stratification, which limits their clinical applicability.ObjectiveThis study aimed to identify critical time points in SA-AKI progression development and validate dynamic, stratified machine learning prediction models for moderate-to-severe (Kidney Disease: Improving Global Outcomes guideline stages 2-3) SA-AKI through multicenter, multiregional external validation, ultimately deploying them as publicly accessible, interpretable clinical decision support tools.MethodsThis study used three independent ICU databases: Medical Information Mart for Intensive Care-IV v3.0 (n=12,842; model development and internal validation), electronic ICU collaborative research database v2.0 (n=15,767; North American multicenter external validation), and the First Affiliated Hospital of Hainan Medical University ICU (n=210; Chinese single-center external validation). We identified 48 hours (acute phase) and 7 days (subacute phase) as critical time points. Based on clinical data from the first 24 hours of ICU admission, we used a two-stage feature selection process combining light gradient boosting machine (LightGBM) and Shapley additive explanation (SHAP) cross-validation analysis with clinical expert review, followed by modeling using 8 machine learning algorithms. The optimal model was selected based on the area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis. Internal validation used 5-fold cross-validation, while external validation and subgroup analyses assessed generalizability across different regions and populations. SHAP values and partial dependence plots were used to interpret the influence of key features on predictions.ResultsOur dual-timepoint LightGBM model demonstrated robust predictive performance. For the 48-hour prediction task, the model achieved an AUC of 0.839 (95% CI 0.824-0.854) in the internal test set, with AUCs of 0.770 (95% CI 0.762-0.779) and 0.793 (95% CI 0.726-0.856) in the external validation cohorts, respectively. For the 7-day prediction task, the corresponding AUCs across the three cohorts were 0.834 (95% CI 0.818-0.850), 0.720 (95% CI 0.711-0.729), and 0.773 (95% CI 0.687-0.851), respectively. Subgroup analyses confirmed robust model performance across different age, gender, and comorbidity subgroups. SHAP analysis identified urine output, mechanical ventilation, Sequential Organ Failure Assessment score, creatinine, Glasgow Coma Scale score, and nephrotoxic drug use as core predictive features. Decision curve analysis confirmed that LightGBM provided consistent clinical benefit across different threshold ranges. The optimal LightGBM model was deployed as a publicly accessible web-based prediction app with integrated SHAP interpretability.ConclusionsThis study developed and validated a dynamic, stratified prediction system that provides stage-specific risk assessment for moderate-to-severe SA-AKI. The system underwent rigorous multiregional, multicenter validation and was translated into an interpretable clinical decision support tool, providing a scientific foundation for precision management.

  • Research Article
  • 10.1038/s41598-025-14372-7
Enhancing software effort estimation with random forest tuning and adaptive decision strategies.
  • Sep 30, 2025
  • Scientific reports
  • Priya Varshini A G + 2 more

Software Effort estimation (SEE) is a vital task for project management as it is essential for resource allocation and project planning. Numerous algorithms have been investigated for forecasting software effort, yet achieving precise predictions remains a significant hurdle in the software industry. To achieve optimal accuracy, machine learning algorithms are employed. Remarkably, Random Forest (RF) algorithm produced better accuracy when compared with various algorithms. In this paper, the prediction is extended by increasing the number of trees and Improved Random Forest (IRF) is implemented by including three decision techniques such as residual analysis, partial dependence plots and feature engineering to improve prediction accuracy. To make improved random forest to be adaptive, it is further extended in this paper by integrating three techniques such as: Bayesian Optimization with Deep Kernel Learning (BO-DKL) to adaptively set hyperparameters, Time-Series Residual Analysis to detect autocorrelation patterns among model error, and Explainable AI techniques Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) to improve feature interpretability. This Improved Adaptive Random Forest (IARF) mutually contributes to a comprehensive evaluation and improvement of accuracy in prediction. Metrics used for evaluation are Mean Absolute Error (MAE), Root Mean Square Error (RMSE), R-Squared, Mean Absolute Percentage Error (MAPE), Mean Absolute Scaled Error (MASE) and Prediction Interval Coverage Probability (PICP). Overall, the improved adaptive RF model had an average improvement ratio of 18.5% on MAE, 20.3% on RMSE, 3.8% on R2, 5.4% on MAPE, 7% reduction in MASE and a 3-5% improvement in PICP across all data sets compared to the Random Forest model, with much improved prediction accuracy. These findings validate that the combination of adaptive learning methods and explainability-based adjustments considerably improves accuracy of software effort estimation models and facilitates more trustworthy decision-making in software development projects.

  • Research Article
  • 10.31158/jeev.2025.38.3.711
랜덤 생존 포레스트의 교육 종단자료 분석 적용: 대학생의 사교육 참여 시점 예측을 중심으로
  • Sep 30, 2025
  • Korean Society for Educational Evaluation
  • Meereem Kim

This study introduces random survival forest (RSF), a machine learning-based survival analysis technique, and validates its utility for educational longitudinal data. RSF is a non-parametric method that handles non-linear relationships and variable interactions without the proportional hazards assumption. Using data from 3,656 university students in the Korean Education Longitudinal Study 2005, we compared RSF with the Cox model and the Kaplan-Meier method for predicting initial participation in employment-related private tutoring. RSF achieved similar predictive accuracy to the Cox model while operating stably without variable selection procedures, and demonstrated superior performance compared to the Kaplan-Meier method throughout the follow-up period. Variable importance analysis identified key predictive factors, and partial dependence plots visualized each variable’s contribution patterns and cumulative risk over time. This study contributes to expanding methodological diversity in educational longitudinal research by demonstrating RSF’s applicability to educational data analysis.

  • Research Article
  • 10.1038/s41598-025-17878-2
Travel efficiency in urban space: the role of built environment in shaping excess travel distance across transport modes.
  • Sep 29, 2025
  • Scientific reports
  • Sangwan Lee + 2 more

This study examines the relationships between three built environment factors (i.e., urban density, transportation accessibility, and neighborhood design) and excess travel distance across three transportation modes: private vehicles (CAR), public transportation (PT), and active transportation (AT) in Gunsan, a medium-sized city in South Korea. Utilizing mobile phone-based mobility data, the analysis integrates Explanatory Factor Analysis, Multilevel Regression, and Extreme Gradient Boosting with Partial Dependence Plots to identify mode-specific and context-dependent patterns in travel inefficiencies. The findings indicate that increased density at both trip origins and destinations is generally associated with reduced excess travel distance, particularly for PT and AT users; however, extremely high density levels may induce inefficiencies for CAR and PT modes. Accessibility demonstrates a counterintuitive effect, where higher accessibility correlates with greater excess travel across all modes. In terms of urban design, a U-shaped relationship emerges for PT and AT modes: excess travel initially decreases at lower levels of design quality, increases at moderate levels (e.g., mid-range compactness and POI diversity), and then declines again at higher levels. This study contributes to offering (1) a more comprehensive understanding of how urban form influences travel efficiency, and (2) implications for enhancing urban efficiency and sustainability.

  • Research Article
  • 10.1080/15440478.2025.2559381
Cutting-Edge Hybrid Machine Learning Models for Forecasting the Acid Resistance of Cementitious Composites Incorporating Eggshell and Glass Powders
  • Sep 29, 2025
  • Journal of Natural Fibers
  • Irfan Ullah + 3 more

ABSTRACT This research introduced advanced hybrid machine learning (ML) techniques to create an efficient model for estimating the compressive strength after acid attack (CSAA). The models were developed based on mixtures containing eggshell powder (ESP) and glass powder (GP). Support vector regression (SVR) was integrated with sophisticated metaheuristic optimization techniques, namely the particle swarm optimization (PSO), firefly algorithm (FFA), and gray wolf optimization (GWO), to develop advanced forecasting models for the CSAA of cementitious composites. Additionally, conventional ML models, including random forest (RF) and decision tree (DT), were utilized for comparison. All three hybrid models demonstrated strong predictive capabilities, with SVR-PSO proving to be the most reliable method, attaining the maximum coefficient of determination (R2) score of 0.984, surpassing SVR-GWO (0.981) and SVR-FFA (0.980). In contrast, the RF model recorded an R2 value of 0.974, while the DT model revealed a significantly reduced R2 of 0.649. The partial dependence analyses and SHapley Additive exPlanations and partial dependence plots analyses highlighted the substantial impact of various parameters, revealing that compressive strength (CS) was the most influential factor, followed by GP and ESP. CS and GP had positive effects, while ESP negatively impacted CSAA. A user-friendly interface was developed to efficiently predict CSAA.

  • Research Article
  • 10.1002/spy2.70110
A Stacked Ensemble Framework for Android Malware Detection Using Semantic Permission Aggregation and Explainable AI
  • Sep 28, 2025
  • SECURITY AND PRIVACY
  • Priya Pudke + 1 more

ABSTRACTAndroid is a popular target for malware attacks due to its rapid proliferation, which emphasizes the need for efficient and understandable detection methods. A novel ensemble‐based framework that combines semantic feature engineering (SFE) and explainable artificial intelligence (XAI) is presented in this paper. It addresses the generalizability and transparency issues with current models. The purpose of combining low‐level Android permissions into high‐level semantic categories is to improve interpretability and reduce feature dimensionality, making the model easier to understand. Grid search optimization is used to assess a stacked ensemble classifier that consists of k‐nearest neighbors (KNN), random forest (RF), gradient boosting (GB), and support vector machine (SVM) models with a logistic regression (LR) as a meta‐learner on the TUANDROMD dataset. The model demonstrates strong class separation and generalization with a high test accuracy of 97.54%, with an F1‐score of 0.9847 and a ROC‐AUC of 0.9929. For explainability, permutation feature importance (PFI), partial dependence plots (PDPs), and a global surrogate decision tree (GSDT) are used. Results show that network and location permissions are the most important features. This study offers a robust and interpretable solution for Android malware detection. Its combination of SFE, ensemble learning (EL), and XAI makes it suitable for real‐world cybersecurity applications, especially on mobile devices.

  • Research Article
  • 10.1038/s41598-025-18757-6
An evaluation of maximizing production and usage of biofuel by machine learning and experimental approach
  • Sep 26, 2025
  • Scientific Reports
  • Krishnamoorthy Ramalingam + 7 more

This work explores a novel integration of experimental conversion of waste cooking oil (WCO) into biodiesel with advanced machine learning modeling to optimize transesterification outcomes. A reusable CaO catalyst derived from egg shells was employed, delivering a more affordable and sustainable option compared to typical homogeneous catalysts. A total of 16 experimental runs were conducted to investigate the effects of catalyst concentration (CC), reaction temperature (RT), and methanol-to-oil molar ratio (MOR) on biodiesel yield. Four boosted ML algorithms XGBoost, AdaBoost, Gradient Boosting Machine (GBM), and CatBoost were applied to model the process, with hyperparameter tuning via grid search and validation through k-fold cross-validation (k = 5) and residual plots to ensure reliability and mitigate overfitting. CatBoost emerged as the best-performing model (R² = 0.955, RMSE = 0.83, MSE = 0.68, MAE = 0.52), predicting a maximum biodiesel yield of 95% at 3% CC, 80 °C RT, and a 6:1 MOR. Feature importance and partial dependence plots identified MOR and CC as the most influential parameters. Engine performance tests further validated the practical viability of CaO-based biodiesel, showing 26% lower CO emissions and 13% lower smoke emissions compared to diesel, resulting in a marginal 2.83% decline in brake thermal efficiency alongside a 4.31% rise in fuel consumption. This interdisciplinary approach combining green catalyst development with interpretable machine learning demonstrates a promising pathway for cleaner energy applications and data-driven optimization in biodiesel research.

  • Abstract
  • 10.1017/ash.2025.354
Development of a Prediction Model for Carbapenem-Resistant Enterobacterales Acquisition in Liver Transplant Recipients
  • Sep 24, 2025
  • Antimicrobial Stewardship & Healthcare Epidemiology : ASHE
  • Mijung Kim + 1 more

Background: Carbapenem-resistant Enterobacterales (CRE) are significant healthcare-associated pathogens. Liver transplant (LT) recipients are particularly vulnerable to CRE acquisition due to frequent hospitalizations, extensive antibiotic exposure, and prolonged stays in intensive care units. This study aimed to develop and evaluate prediction models for CRE acquisition in LT recipients at a hospital where more than 500 LT surgeries are performed annually. Method: This case-control study retrospectively analyzed the electronic medical records of 1,250 adult LT recipients (250 CRE-positive and 1,000 CRE-negative cases) at a 2,768-bed tertiary hospital in Seoul, Korea, from February 2020 to February 2024. Data imbalance was addressed using the synthetic minority over-sampling technique, and missing values were handled through median imputation and k-nearest neighbor imputation methods. Prediction models were developed using logistic regression, random forest, and extreme gradient boosting (XGBoost) algorithms, with optimal models selected through 5-fold cross-validation and recursive feature elimination. Model interpretability was enhanced using Shapley additive explanations and partial dependence plot analyses. Result: Of the CRE isolates, 94% were carbapenemase-producing Enterobacterales, with Klebsiella pneumoniae comprising 55.7% of all CRE isolates. Univariate analysis revealed significant differences between groups in LT month (June-September, p<.001), mechanical ventilation over 72 hours (p=.002), and model for end-stage liver disease (MELD) score (p=.041). The XGBoost model, selected as the final model, demonstrated strong specificity (0.848) and a high negative predictive value (NPV 0.830) for identifying non-carriers, although its overall predictive power was limited. Features used in the XGBoost model included LT month, third-generation cephalosporins, and the presence of hepatocellular carcinoma, all of which showed a positive correlation with CRE acquisition. In contrast, mechanical ventilation over 72 hours and living donor LT exhibited negative correlations. Viral hepatitis and body mass index were included in the model, but their impact on CRE acquisition risk remained unclear. Notably, the negative association of mechanical ventilation contrasts with findings from previous studies, highlighting the need for further investigation. Conclusion: This study demonstrates the clinical relevance of machine learning models in predicting CRE acquisition among LT recipients. The XGBoost model showed high specificity and NPV, indicating its potential to effectively identify low-risk patients. Future studies could benefit from adopting prospective, multicenter designs to clarify causal relationships and improve model performance.

  • Research Article
  • 10.1080/19396368.2025.2560839
Prediction of polycystic ovary syndrome using machine learning with SFS and Boruta feature selection: an explainable AI approach
  • Sep 21, 2025
  • Systems Biology in Reproductive Medicine
  • Monali Ramteke + 1 more

Polycystic Ovary Syndrome (PCOS) is a complex endocrine disorder affecting numerous women of reproductive age, characterized by a variety of clinical and biochemical features. Accurate classification and diagnosis of PCOS remains challenging due to the heterogeneous nature of its manifestations. This study introduces a robust machine learning framework that combines a voting ensemble model with two distinct feature selection techniques, Sequential Forward Selection (SFS) and Boruta, to enhance the accuracy in classifying PCOS. We also utilized Explainable Artificial Intelligence (XAI) techniques, such as Shapley Additive Explanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), Partial Dependence Plot (PDP), AnchorTabular, and Permutation Importance, to interpret the ensemble model. These methods provide essential insights into the significance of key features for predicting PCOS patients. Results show that the proposed ensemble learning model achieved optimal performance with the feature selection technique used. Specifically, the proposed voting ensemble classifier and features picked by SFS had the highest accuracy among all models. This method can help in PCOS diagnosis and support early intervention.

  • Research Article
  • 10.3390/s25185797
An Explainable Deep Learning-Based Predictive Maintenance Solution for Air Compressor Condition Monitoring
  • Sep 17, 2025
  • Sensors (Basel, Switzerland)
  • Alexandru Ciobotaru + 3 more

Air compressors are vital across various sectors—automotive, manufacturing, buildings, and healthcare—as they provide pressurized air for air suspension systems in vehicles, supply power pneumatic machines throughout industrial production lines, and support non-clinical infrastructure within hospital environments, including pneumatic control systems, isolation room pressurization, and laboratory equipment operation. Ensuring that such components are reliable is critical, as unexpected failures can disrupt facility functions and compromise patient safety. Predictive maintenance (PdM) has emerged as a key factor in enhancing the reliability and operational efficiency of medical devices by leveraging sensor data and artificial intelligence (AI)-based algorithms to detect component degradation before functional failures occur. In this paper, a predictive maintenance solution for condition monitoring and fault prediction for the exhaust valve, bearings, water pump, and radiator of an air compressor is presented, by comparing a hybrid deep neural network (DNN) as a feature extractor and a support vector machine (SVM) for condition classification: a pure DNN classifier as well as a standalone SVM model. Additionally, each model was trained and validated on three devices—NVIDIA T4 GPU, Raspberry Pi 4 Model B, and NVIDIA Jetson Nano—and performance reports in terms of latency, energy consumption, and CO2 emissions are presented. Moreover, three model agnostic explainable AI (XAI) methods were employed to increase the transparency of the hybrid model’s final decision: Shapley additive explanations (SHAP), local interpretable model-agnostic explanations (LIME) and partial dependence plots (PDP). The hybrid model achieves on average 98.71%, 99.25%, 98.78%, and 99.01% performance in terms of accuracy, precision, recall, and F1-score across all devices Additionally, the DNN baseline and SVM model achieve on average 93.2%, 88.33%, 90.45%, and 89.37%, as well as 93.34%, 88.11%, 95. 41%, and 91.62% performance in terms of accuracy, precision, recall, and F1-score across all devices. The integration of XAI methods within the PdM pipeline offers enhanced transparency, interpretability, and trustworthiness of predictive outcomes, thereby facilitating informed decision-making among maintenance personnel.

  • Research Article
  • 10.3389/fonc.2025.1650377
Exploring the prognostic value of EBV DNA in advanced nasopharyngeal carcinoma treated with chemoradiotherapy using AI-based modeling
  • Sep 12, 2025
  • Frontiers in Oncology
  • Yang Yang + 10 more

BackgroundEpstein–Barr virus (EBV) DNA is a well-established biomarker in nasopharyngeal carcinoma (NPC), but its integration into artificial intelligence (AI)–based prognostic tools remains limited. This study aimed to develop and validate AI models incorporating EBV DNA load levels to predict progression-free survival (PFS) in patients with advanced NPC treated with concurrent chemoradiotherapy (CRT).MethodsA retrospective multicenter cohort of 503 patients was divided into training (n = 301) and validation (n = 202) sets. Four machine learning algorithms—Cox regression, LASSO, RSF, and GBM—were applied to predict 1- and 1.5-year PFS in patients with advanced NPC. Model performance was evaluated using the concordance index (C-index), time-dependent receiver operating characteristic (ROC), decision curve analysis (DCA), and interpretability tools such as SHAP values and partial dependence plots (PDP).ResultsThe 1-, 3-, and 5-year PFS rates were 100.0%, 91.5%, and 88.6% in the EBV = 0 group; 99.4%, 91.2%, and 88.5% in the > 0 and < 1500 group; and 92.3%, 81.0%, and 75.7% in the ≥ 1500 group, respectively, with statistically significant differences among the three groups (P = 0.0024). The RSF model outperformed other models with the highest C-index (0.778) and area under the ROC curve of 0.810 and 0.634 at 1 and 1.5 years, respectively. EBV DNA emerged as the most influential predictor across all interpretability analyses. Patients with EBV DNA ≥1500 copies/ml had the poorest predicted survival, showing a distinct threshold effect in the PDP.ConclusionsHigh EBV DNA levels were associated with poorer PFS in advanced NPC. Among the models evaluated, the RSF model demonstrated the best predictive performance and interpretability. EBV-informed AI modeling represents a promising approach for enhancing individualized risk prediction and clinical decision-making in NPC.

  • Research Article
  • 10.3389/fpubh.2025.1659322
A machine learning approach to healthcare needs and barriers using the 100% Community Survey of access to SDOH services
  • Sep 10, 2025
  • Frontiers in Public Health
  • Karikarn Chansiri + 3 more

BackgroundAccess to health care is a key social determinant of health, yet individual experiences of need and barriers—especially in rural and racially diverse regions—are often overlooked. Traditional models may miss complex sociodemographic and household patterns. This study applies machine learning (ML) to examine healthcare needs and access barriers among adults in New Mexico, a diverse state with high service needs.Objectives(1) Identify predictors of self-reported healthcare needs across medical, dental, and mental health domains; (2) determine factors and reasons linked to access barriers; (3) compare performance across seven ML algorithms; and (4) generate interpretable insights to inform interventions.MethodsWe analyzed survey data from 9,099 adults across 13 New Mexico counties (2019–2024). Predictors included sociodemographic, geographic, and household factors. Models—spanning linear, tree-based, kernel-based, and neural networks—were evaluated using recall, F1-score, and area under the precision-recall curve. Interpretability tools included SHAP, partial dependence plots, and permutation importance.Results(1) Predictors varied by domain. Mental health needs were linked to younger age, low income, limited family support, and being female. Dental needs were highest among higher-income White parents; medical needs were tied to larger households and parenting status. Family support consistently reduced barriers. (2) Common barriers included cost, wait times, and provider shortages. Hispanic respondents reported fewer mental health barriers. (3) Neural networks and tree-based models performed best (recall up to 0.99). (4) Interpretability methods revealed complex, nonlinear predictor patterns.ConclusionML models revealed complex, domain-specific patterns of need and access, highlighting the limitations of one-size-fits-all approaches. Community-based initiatives like 100% Community can leverage these insights to target structurally excluded populations and strengthen local support systems. Hyperlocal planning, state-level policy reform, and family-centered interventions are essential to addressing healthcare disparities in high-need settings.

  • Research Article
  • 10.54103/2282-0930/29299
A Random Forest Algorithm For Identifying Risk Factors For Multimorbidity In The UK Biobank Cohort
  • Sep 8, 2025
  • Epidemiology, Biostatistics, and Public Health
  • Linia Patel + 4 more

Introduction: High-income countries are undergoing significant demographic shifts, characterized by population decline and progressive aging. These transformations are associated with an increase in the prevalence of chronic diseases, which often coexist, worsening individuals’ quality of life and increasing healthcare costs. Identifying the factors that contribute to the onset of multimorbidity is particularly complex, as these factors often interact with each other and cause multiple effects across different diseases. Objectives: This study aimed to identify the main risk factors for multimorbidity within a large UK cohort using a fully nonparametric ensemble method. This approach makes no assumptions about the underlying relationships between variables and allow managing high-dimensional data while preventing overfitting. Methods: We analyzed data from the UK Biobank cohort, which includes detailed information on socioeconomic status, lifestyle, anthropometric measures, and environmental exposures collected at recruitment, along with disease occurrence obtained through linkage with hospital admissions (primary and secondary diagnoses), death records, and cancer registries. Multimorbidity was defined as the presence of at least two chronic conditions from a list developed through an international consensus using a modified Delphi method [1]. To assess the role of 18 candidate variables in predicting the onset of multimorbidity over a five-year follow-up, we applied a random forest algorithm adapted for survival analysis within a competing risk framework [2], considering two competing events: the development of multimorbidity and death prior to its onset. The candidate variables included: white British/Irish ethnicity (Yes/No), qualification level, average total household income before tax (adjusted for household size and categorized into quintiles), area-level index of multiple deprivation (deciles), body mass index (kg/m2), waist circumference (cm), pack-years of smoking, alcohol drinking (g/day), healthy diet score (ranging from 0 to 5, based on the intake of fruit, vegetables, fish, whole grains, processed and red meat), walking (at least 10 min, number of times a week), moderate physical activity (at least 10 min, number of times a week), vigorous physical activity (at least 10 min, number of times a week), particulate matter air pollution 2.5 (PM2.5) (µg/m3), PM2.5-10 (µg/m3), PM10 (µg/m3), NO2 (µg/m3), average exposure to evening (7:00 pm – 11:00 pm) or night noise (11:00 pm – 7:00 am) (dB). Results were summarised using out-of-bag partial dependence plots and variable importance (VIMP) metrics. Results: Of the 422,344 individuals included in the cohort, aged between 39 and 73 years, we selected 137,565 participants who were free from the conditions included in the definition of multimorbidity at the time of recruitment and for whom risk factor information was available. During the five-year follow-up, 4384 individuals developed multimorbidity (2740 males, 1644 females). The five-year cumulative incidence was 3.9% in males and 2.6% in females. Among individuals who developed multimorbidity during follow-up, the main conditions observed were cancer (52.4% of males and 52.1% of females), arrhythmias (44.7% of males and 28.5% of females) and coronary artery disease (42.1% of males and 24.8% of females). Based on VIMP metrics, the strongest predictors in men were smoking, waist circumference, and sleep duration; in women alcohol, smoking, and waist circumference. Five-year cumulative incidence was higher for heavy smokers (sex-specific 95th percentile of pack-years) (males: 6.3%, females: 4.0%) compared to non-smokers (males: 3.5%, females: 2.4%); for individuals with elevated waist circumference (sex-specific 95th percentile) (males: 6.1%, females: 5.2%) versus those with median values (males: 3.9%, females 2.6%); for heavy alcohol drinkers (sex-specific 95th percentile) (males: 4.6%, females: 4.0%) versus median intake (males: 3.8%, females: 2.4% ); for those sleeping 4 hours/day (males: 6.3%, females: 4.2%) or 10 hours/day (males: 6.5%, females: 4.5%) versus 7 hours/day (males: 3.7%, females: 2.5%). Diet, physical activity, and air pollution had smaller impacts. Conclusions: Preventive interventions targeting smoking, abdominal obesity, and heavy alcohol consumption among middle-aged adults in the UK and likely in other high-income countries, may substantially reduce the incidence of multimorbidity. Such interventions could improve the health trajectory and burden of disease of future older populations. In addition, promoting adequate sleep duration appears to be beneficial and should be integrated into public health recommendations.

  • Research Article
  • 10.1080/02664763.2025.2554823
A note on two novel easy-to-interpret feature effect measures for partial dependence plots in a classification setting
  • Sep 6, 2025
  • Journal of Applied Statistics
  • Andreas Karlsson Rosenblad

Classification of observations into one of several distinct categories is a common task in applied statistics, traditionally performed using parametric statistical models such as logistic regression. These parametric models are, however, often outperformed in terms of prediction accuracy by black box supervised learning models (BBSLMs). A drawback of BBSLMs is the lack of easy-to-interpret feature effect measures similar to the odds ratio (OR) for logistic regression models. The present paper derives two novel feature effect measures based on partial dependence plots for binary classification using BBSLMs: the relative risk of marginal effects (RRME) and the odds ratio of marginal effects (ORME). The performance and interpretation of these new measures are illustrated in an application studying the risk of death within 48 hours of admission among individuals admitted to hospital with a myocardial infarction. The BBSLMs are shown to have better predictive ability than the logistic regression models, with the RRME:s and ORME:s of death for the main risk factor anterior infarct both being 1.8, comparable to the OR of 1.9 for the logistic regression model. The RRMEs and ORMEs are also shown to be more robust in terms of being applicable also for observations with missing values for some features.

  • Research Article
  • 10.3389/fpubh.2025.1672479
Decoding the association between health level and human settlements environment: a machine learning-driven provincial analysis in China
  • Sep 3, 2025
  • Frontiers in Public Health
  • Haidong Zhu + 1 more

BackgroundRapid urbanization in China has significantly reshaped the human settlement environment (HSE), bringing opportunities and challenges for public health. While existing studies have explored environmental-health relationships, most are confined to micro-level contexts, focus on single environmental dimensions, or assess specific diseases, thus lacking a comprehensive, macro-level understanding.ObjectiveThis study aims to assess the associations between population health level and multidimensional HSE features at the provincial level in China and uncover nonlinear relationships and interaction effects underlying the association between HSE and population health level.MethodsUsing panel data from 31 Chinese provinces spanning 2012 to 2022, a composite Health Level Index (HLI) was constructed based on four core health indicators using the Entropy-TOPSIS method. 19 HSE indicators covering five dimensions—ecological environment, living environment, infrastructure, public services, and sustainable environment—were selected as explanatory variables. The study employed the XGBoost machine learning algorithm to model the relationship between HSE and HLI. SHAP values and Partial Dependence Plots (PDPs) were used to interpret feature importance, nonlinear relationships, threshold values, and interaction effects.ResultsXGBoost outperformed all benchmark models, confirming its strong predictive capacity. SHAP analysis identified six key features—number of medical institution beds (NMIB), urbanization rate (UR), mobile phone penetration rate (MPPR), road area per capita (RAPC), population density (PD), and urban gas penetration rate (UGPR)—as the most influential factors. Nonlinear relationships and threshold effects were observed between key features and population health level. PDP plots further revealed that optimal health levels are typically associated with high UR, high MPPR, high RAPC, and moderate NMIB, underscoring the importance of structural synergy over isolated infrastructure expansion.ConclusionThis study provides robust evidence that the relationship between HSE and health is nonlinear, multidimensional, and highly interactive. Effective urban health governance requires coordinated development of urbanization, digital infrastructure, and public services, along with rational healthcare resource allocation. The findings offer actionable insights for health-oriented urban planning and policy formulation in rapidly urbanizing regions.

  • Research Article
  • 10.1038/s41598-025-16477-5
Investigating factors influencing injury severity in crashes involving vulnerable road users in Pakistan.
  • Sep 2, 2025
  • Scientific reports
  • Muhammad Junaid + 4 more

Road traffic crashes claim around 1.19million lives annually worldwide, with over half of the fatalities involving vulnerable road users (VRUs). While several studies have explored the risk factors associated with specific categories of VRUs in Pakistan, research focusing on VRUs collectively, considering all categories and their unique safety challenges, remains limited. This study aims to examine the influence of various risk factors on the severity of injuries resulting from crashes involving VRUs, using a three-year dataset (2021-2023). The study evaluated the effectiveness of six boosting-based ensemble machine learning classifiers across multiple evaluation metrics. The findings indicated that boosting with decision stumps outperformed extreme gradient boosting, light gradient boosting, histogram-based gradient boosting, categorical boosting, and adaptive boosting in terms of recall, F1-score, and accuracy. The partial dependence plots demonstrated that VRUs aged 55 years or older, collisions with other VRU groups, involvement of vans and heavy vehicles, rainy weather, the COVID-19 period, and the existence of painted medians increase the likelihood of severe injury in crashes involving VRUs. The pairwise SHAP interaction plot also supported these findings by illustrating that the interaction between different vehicle types (vans and heavy vehicles), adverse weather conditions, and VRU crashes during the COVID-19 lockdown period elevates the risk of severe crashes. Based on the study findings, several policy recommendations were proposed, including implementing education and awareness programs, developing strategies to manage mixed traffic, and improving road infrastructure to enhance safety for all VRU groups.

  • Research Article
  • 10.1016/j.jad.2025.119399
Machine learning-based predictive modeling of depressive symptoms in Chinese adolescents.
  • Sep 1, 2025
  • Journal of affective disorders
  • Lijie Ding + 3 more

Machine learning-based predictive modeling of depressive symptoms in Chinese adolescents.

  • Research Article
  • 10.1007/s43995-025-00214-0
High-Accuracy prediction of roughness in CRCP using a hybrid genetic Algorithm–SVR approach
  • Sep 1, 2025
  • Journal of Umm Al-Qura University for Engineering and Architecture
  • Ali Alnaqbi + 3 more

Abstract The long-term performance of Continuously Reinforced Concrete Pavement (CRCP) and the optimization of maintenance strategies depend on the accurate forecasting of the International Roughness Index (IRI). For the purpose of accurately predicting the IRI in CRCP, this study offers a strong hybrid modeling framework that combines Support Vector Regression (SVR) with Genetic Algorithm (GA) optimization. Utilizing an extensive dataset from the Long-Term Pavement Performance (LTPP) program that included 395 observations and 33 CRCP sections, the suggested GA-SVR model was assessed against a number of benchmark models, such as Artificial Neural Networks (ANN), Decision Trees, Random Forests, Linear Regression, and SVR. The GA-optimized SVR model significantly outperformed all alternatives, achieving a mean RMSE of 0.039 and a coefficient of determination (R²) of 0.991 across five-fold cross-validation. Comprehensive residual analysis confirmed the model’s stability, while sensitivity analysis and feature importance rankings identified key influential variables such as Initial IRI, Layer 4 Type, and Layer 3 Thickness. Partial Dependence Plots and 3D visualizations further demonstrated how these factors affect IRI trends. The findings underscore the model’s high reliability, interpretability, and potential to support proactive pavement maintenance and design decisions. This research contributes a scalable and interpretable tool for enhancing the predictive capabilities of pavement performance models in data-driven infrastructure management.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2025 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers