Predicting Toxicities and Survival Outcomes in De Novo Metastatic Hormone-Sensitive Prostate Cancer Using Clinical Features, Routine Blood Tests and Their Early Variations

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Conventional prognostic factors are typically assessed at diagnosis in metastatic hormone-sensitive prostate cancer (mHSPC). However, variations in vital signs and laboratory parameters occur during systemic treatment and may predict patients' prognosis and anticipate organ-specific toxicity development. This single-center retrospective study included 363 patients with de novo mHSPC treated between 2014 and 2023. Clinical and laboratory data were systematically collected from the hospital data warehouse, from treatment initiation through the following seven months. Variations in vital parameters and blood test results were graded using CTCAE V5.0 (dynamic variables). Cox regression analyses were performed to explore the impact of dynamic variables on progression-free survival (PFS) and overall survival (OS). Machine learning (ML) models (Support Vector Classifier, Random Forest, and LGBM Classifier) were developed to predict single organ-specific toxicities and to identify good and poor responders based on 7-month PSA levels, PFS and OS. We compared ML model performance when trained only on baseline factors (static models) with those integrating variables generated by vital sign and blood test monitoring within 3 and 7 months from treatment start (dynamic models). Dynamic model failed to improve the prediction of single organ-specific toxicities. Univariable Cox analysis revealed that the development of hematological, liver, and kidney-related toxicity, as well as the development of electrolyte disturbances within 3 or 7 months, was associated with shorter PFS (p = 0.011, 0.007, 0.174, and 0.02, respectively) and/or OS (p = 0.001, 0.099, 0.012, and 0.001, respectively). In multivariable Cox analysis, increasing alkaline phosphatase levels (HR = 1.93, p = 0.009), decreasing albumin (HR = 1.92, p = 0.008) and development of hyponatremia (HR = 1.79, p = 0.033) were associated with a shorter OS. The combination of static and dynamic variables significantly improved the ability of ML models to identify poor responders (shorter PFS: AUC range 0.91-0.94 vs. 0.79-0.89). The integration of conventional prognostic factors with the detection of significant changes in vital signs and blood tests occurring early during systemic treatment in patients with de novo mHSPC may enhance patient stratification and improve prediction of survival outcomes. Multicenter validation studies are needed to confirm these results.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1111/trf.15931
Early recognition of possible transfusion reactions using an electronic automatic notification system for changes in vital signs in patients undergoing blood transfusions.
  • Jul 20, 2020
  • Transfusion
  • Young Ae Lim + 2 more

The study was designed to evaluate the ability of a novel electronic automatic notification system (EANS) to detect significant changes in transfusion-associated vital signs (VSs) during transfusion and to determine whether the EANS improved acute transfusion reaction (ATR) detection rates and suspected ATR reporting rates. VSs were measured three times per unit or batch product transfused:-before, 15 minutes after commencement, and at the completion of the transfusion-and recorded on the EANS. Significant changes in VSs were defined as increased temperature (≥38°C or ≥1°C change in baseline temperature), 20 mm Hg or 20% increase or decrease in systolic blood pressure, or 20% increase in pulse rate. The 6-month periods preceding and after the introduction of the EANS were defined as "before" and "after." Data from these periods were used for comparison and evaluation. During the after period, 945 notifications were reported from the EANS and 521 suspected ATR were detected. The suspected ATR reporting rates for the before and after were 0.29% (73/25 213) and 2.06% (521/25 304, P < .001) and the ATR detection rates before and after were 0.13% (33/25 213) and 0.49% (116/25 304, P < .001), respectively. Among 116 ATR cases, 49.1% could be detected only by significant changes in VSs. The EANS was very effective in detecting ATRs that could have been overlooked by medical staff. Further data are needed to demonstrate the extent to which the introduction of an EANS may improve the safety of transfused patients.

  • Research Article
  • 10.1093/bjs/znac242.031
O031 Machine learning models in renal transplantation: a systematic review and meta-analysis of predictive performance in graft outcomes
  • Jul 22, 2022
  • British Journal of Surgery
  • B Ravindhran + 6 more

Introduction Kidney transplantation (KT) is currently the renal replacement therapy of choice for most patients with end-stage kidney disease. Despite many advancements, the variations in outcome and frequent occurrence of graft failure continue to pose important clinical and research challenges. The aim of this study was to carry out a systematic review of the current application of Machine Learning (ML) models in KT and perform a meta-analysis of these models in the prediction of graft outcomes. Methods This review was registered with the PROSPERO database (CRD42021247469) and all peer reviewed and preprint original articles that reported the sensitivity and specificity of AI-based models were included in the meta-analysis. Data were analysed using MetaDTA,,an interactive online software for meta-analysis of diagnostic studies. The diagnostic performance of the meta-analysis was assessed by a summary receiver operating characteristics (sROC) plot. Results 38 studies met the inclusion criteria for the review and 12 studies met the inclusion criteria for the meta-analysis. The most common models used were artificial neural networks, decision trees and Bayesian belief networks. Seven studies compared the predictive performance of ML models with traditional regression methods. The summary sensitivity and specificity of ML-based models were 0.84 (95% CI, 0.72–0.91) and 0.68 (95% CI, 0.57–0.77), respectively. The area under the SROC for all the available evidence was 0.83. The Diagnostic Odds Ratio of ML models was 11.19 (95% CI 6.66–18.75). Conclusion Our study shows that ML models can accurately predict outcomes following KT by the integration of the vast amounts of available clinical data. Take-home message This study confirms the superior ability of ML Models in handling complex relationships between large datasets, features and outcomes, which has definitely led to improved precision and accuracy of outcomes.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 22
  • 10.2196/47833
Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis.
  • Nov 20, 2023
  • JMIR Medical Informatics
  • Kui Liu + 9 more

Machine learning (ML) models provide more choices to patients with diabetes mellitus (DM) to more properly manage blood glucose (BG) levels. However, because of numerous types of ML algorithms, choosing an appropriate model is vitally important. In a systematic review and network meta-analysis, this study aimed to comprehensively assess the performance of ML models in predicting BG levels. In addition, we assessed ML models used to detect and predict adverse BG (hypoglycemia) events by calculating pooled estimates of sensitivity and specificity. PubMed, Embase, Web of Science, and Institute of Electrical and Electronics Engineers Explore databases were systematically searched for studies on predicting BG levels and predicting or detecting adverse BG events using ML models, from inception to November 2022. Studies that assessed the performance of different ML models in predicting or detecting BG levels or adverse BG events of patients with DM were included. Studies with no derivation or performance metrics of ML models were excluded. The Quality Assessment of Diagnostic Accuracy Studies tool was applied to assess the quality of included studies. Primary outcomes were the relative ranking of ML models for predicting BG levels in different prediction horizons (PHs) and pooled estimates of the sensitivity and specificity of ML models in detecting or predicting adverse BG events. In total, 46 eligible studies were included for meta-analysis. Regarding ML models for predicting BG levels, the means of the absolute root mean square error (RMSE) in a PH of 15, 30, 45, and 60 minutes were 18.88 (SD 19.71), 21.40 (SD 12.56), 21.27 (SD 5.17), and 30.01 (SD 7.23) mg/dL, respectively. The neural network model (NNM) showed the highest relative performance in different PHs. Furthermore, the pooled estimates of the positive likelihood ratio and the negative likelihood ratio of ML models were 8.3 (95% CI 5.7-12.0) and 0.31 (95% CI 0.22-0.44), respectively, for predicting hypoglycemia and 2.4 (95% CI 1.6-3.7) and 0.37 (95% CI 0.29-0.46), respectively, for detecting hypoglycemia. Statistically significant high heterogeneity was detected in all subgroups, with different sources of heterogeneity. For predicting precise BG levels, the RMSE increases with a rise in the PH, and the NNM shows the highest relative performance among all the ML models. Meanwhile, current ML models have sufficient ability to predict adverse BG events, while their ability to detect adverse BG events needs to be enhanced. PROSPERO CRD42022375250; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=375250.

  • Research Article
  • 10.18502/japh.v8i3.13783
Ensemble learning models for the prediction of the weekly peak of PM2.5 concentration in Algiers, Algeria
  • Oct 8, 2023
  • Journal of Air Pollution and Health
  • Sabri Ghazi + 4 more

Introduction: This paper focuses on the prediction of weekly peak levels of Particulate Matter with an aerodynamic diameter of less than 2.5 µm (PM2.5 ), using various Machine Learning (ML) models. The study compares ML models to deep learning models and emphasizes the explain ability of ML models for PM2.5 prediction.&#x0D; Materials and methods: We examine different combinations of features and time window dimensions to evaluate the performance of ML models. It utilizes Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT), and five Ensemble Models (EL) including AdaBoost, XGBoost, LightGBM, CatBoost, and Random Forest (RF). The dataset includes three years of daily measurements of weather parameters and PM2.5.&#x0D; Results: Lagged values of PM2.5 improves prediction performance, particularly when the lagged value window size spans seven days or multiples thereof. This confirms that road traffic, which exhibits a weekly seasonality, is the primary source of PM2.5 in Algiers. Interestingly, including lagged values of weather parameters decreases prediction performance, even when chosen based on their correlation with PM2.5. The AdaBoost model performs the best, achieving a Root Mean Squared Error (RMSE) of 2.899 µg/m³ and an R2 value of 0.96.&#x0D; Conclusion: EL models, specifically AdaBoost, exhibit strong performance in predicting PM 2.5 levels. They not only provide accurate predictions but also allow analysis of feature importance. Lagged values of PM2.5 have a greater impact on predictions compared to weather parameters. Surprisingly, including weather parameters hampers prediction performance. Therefore, the utilization of ensemble learning models offers valuable insights into feature significance.

  • Research Article
  • Cite Count Icon 8
  • 10.12989/gae.2021.25.1.001
Landslide susceptibility assessment using feature selection-based machine learning models
  • Jan 1, 2021
  • Geomechanics and Engineering
  • Leilei Liu + 2 more

Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

  • Research Article
  • Cite Count Icon 13
  • 10.1007/s00261-021-03051-6
Predicting the stages of liver fibrosis with multiphase CT radiomics based on volumetric features.
  • Mar 22, 2021
  • Abdominal Radiology
  • Enming Cui + 6 more

To develop and externally validate a multiphase computed tomography (CT)-based machine learning (ML) model for staging liver fibrosis (LF) by using whole liver slices. The development dataset comprised 232 patients with pathological analysis for LF, and the test dataset comprised 100 patients from an independent outside institution. Feature extraction was performed based on the precontrast (PCP), arterial (AP), portal vein (PVP) phase, and three-phase CT images. CatBoost was utilized for ML model investigation by using the features with good reproducibility. The diagnostic performance of ML models based on each single- and three-phase CT image was compared with that of radiologists' interpretations, the aminotransferase-to-platelet ratio index, and the fibrosis index based on four factors (FIB-4) by using the receiver operating characteristic curve with the area under the curve (AUC) value. Although the ML model based on three-phase CT image (AUC = 0.65-0.80) achieved higher AUC value than that based on PCP (AUC = 0.56-0.69) and PVP (AUC = 0.51-0.74) in predicting various stage of LF, significant difference was not found. The best CT-based ML model (AUC = 0.65-0.80) outperformed the FIB-4 in differentiating advanced LF and cirrhosis and radiologists' interpretation (AUC = 0.50-0.76) in the diagnosis of significant and advanced LF. All PCP, PVP, and three-phase CT-based ML models can be an acceptable in assessing LF, and the performance of the PCP-based ML model is comparable to that of the enhanced CT image-based ML model.

  • Research Article
  • 10.1021/acs.joc.5c01229
Deconvoluting the Effects of Substituents on Reaction Barriers through Machine Learning: The Case of Brønsted Acid-Mediated Nazarov Cyclizations.
  • Jul 16, 2025
  • The Journal of organic chemistry
  • Angus Keto + 1 more

The barrier heights of Nazarov cyclizations are influenced by substituents in ways that can be challenging to predict. We explore the ability of machine learning (ML) models to predict the barriers by learning the activating and deactivating contributions of different substituents on the divinyl ketone. Random forest and graph neural network models are shown to achieve good predictive accuracies, with mean absolute errors of ca. 2 kcal/mol against density functional theory-calculated barriers, after training on a compact data set of 272 reactions using simple and interpretable features that do not require performing quantum mechanical calculations. Substituents' electronic and steric feature contributions to the barriers are quantified through feature importance analysis using the SHAP model interpretation framework. To achieve a low barrier for cyclization, the most impactful substrate design strategy entails placing substituents α to the carbonyl group to sterically control the accessible reactant conformations and electronically stabilize the forming oxyallyl cation. The analysis also explains low or high barriers for several substrates that do not conform to empirical rules. This work illustrates the ability of ML models to capture complicated synergistic and antagonistic effects on chemical reactivity, and demonstrates how ML model feature importance analysis complements insights available from density functional theory (DFT).

  • Research Article
  • Cite Count Icon 36
  • 10.1097/corr.0000000000001360
Does Artificial Intelligence Outperform Natural Intelligence in Interpreting Musculoskeletal Radiological Studies? A Systematic Review.
  • Jul 30, 2020
  • Clinical Orthopaedics &amp; Related Research
  • Olivier Q Groot + 7 more

Machine learning (ML) is a subdomain of artificial intelligence that enables computers to abstract patterns from data without explicit programming. A myriad of impactful ML applications already exists in orthopaedics ranging from predicting infections after surgery to diagnostic imaging. However, no systematic reviews that we know of have compared, in particular, the performance of ML models with that of clinicians in musculoskeletal imaging to provide an up-to-date summary regarding the extent of applying ML to imaging diagnoses. By doing so, this review delves into where current ML developments stand in aiding orthopaedists in assessing musculoskeletal images. This systematic review aimed (1) to compare performance of ML models versus clinicians in detecting, differentiating, or classifying orthopaedic abnormalities on imaging by (A) accuracy, sensitivity, and specificity, (B) input features (for example, plain radiographs, MRI scans, ultrasound), (C) clinician specialties, and (2) to compare the performance of clinician-aided versus unaided ML models. A systematic review was performed in PubMed, Embase, and the Cochrane Library for studies published up to October 1, 2019, using synonyms for machine learning and all potential orthopaedic specialties. We included all studies that compared ML models head-to-head against clinicians in the binary detection of abnormalities in musculoskeletal images. After screening 6531 studies, we ultimately included 12 studies. We conducted quality assessment using the Methodological Index for Non-randomized Studies (MINORS) checklist. All 12 studies were of comparable quality, and they all clearly included six of the eight critical appraisal items (study aim, input feature, ground truth, ML versus human comparison, performance metric, and ML model description). This justified summarizing the findings in a quantitative form by calculating the median absolute improvement of the ML models compared with clinicians for the following metrics of performance: accuracy, sensitivity, and specificity. ML models provided, in aggregate, only very slight improvements in diagnostic accuracy and sensitivity compared with clinicians working alone and were on par in specificity (3% (interquartile range [IQR] -2.0% to 7.5%), 0.06% (IQR -0.03 to 0.14), and 0.00 (IQR -0.048 to 0.048), respectively). Inputs used by the ML models were plain radiographs (n = 8), MRI scans (n = 3), and ultrasound examinations (n = 1). Overall, ML models outperformed clinicians more when interpreting plain radiographs than when interpreting MRIs (17 of 34 and 3 of 16 performance comparisons, respectively). Orthopaedists and radiologists performed similarly to ML models, while ML models mostly outperformed other clinicians (outperformance in 7 of 19, 7 of 23, and 6 of 10 performance comparisons, respectively). Two studies evaluated the performance of clinicians aided and unaided by ML models; both demonstrated considerable improvements in ML-aided clinician performance by reporting a 47% decrease of misinterpretation rate (95% confidence interval [CI] 37 to 54; p < 0.001) and a mean increase in specificity of 0.048 (95% CI 0.029 to 0.068; p < 0.001) in detecting abnormalities on musculoskeletal images. At present, ML models have comparable performance to clinicians in assessing musculoskeletal images. ML models may enhance the performance of clinicians as a technical supplement rather than as a replacement for clinical intelligence. Future ML-related studies should emphasize how ML models can complement clinicians, instead of determining the overall superiority of one versus the other. This can be accomplished by improving transparent reporting, diminishing bias, determining the feasibility of implantation in the clinical setting, and appropriately tempering conclusions. Level III, diagnostic study.

  • Supplementary Content
  • Cite Count Icon 23
  • 10.2196/35293
Comparison of Severity of Illness Scores and Artificial Intelligence Models That Are Predictive of Intensive Care Unit Mortality: Meta-analysis and Review of the Literature
  • May 31, 2022
  • JMIR Medical Informatics
  • Cristina Barboi + 2 more

BackgroundSeverity of illness scores—Acute Physiology and Chronic Health Evaluation, Simplified Acute Physiology Score, and Sequential Organ Failure Assessment—are current risk stratification and mortality prediction tools used in intensive care units (ICUs) worldwide. Developers of artificial intelligence or machine learning (ML) models predictive of ICU mortality use the severity of illness scores as a reference point when reporting the performance of these computational constructs.ObjectiveThis study aimed to perform a literature review and meta-analysis of articles that compared binary classification ML models with the severity of illness scores that predict ICU mortality and determine which models have superior performance. This review intends to provide actionable guidance to clinicians on the performance and validity of ML models in supporting clinical decision-making compared with the severity of illness score models.MethodsBetween December 15 and 18, 2020, we conducted a systematic search of PubMed, Scopus, Embase, and IEEE databases and reviewed studies published between 2000 and 2020 that compared the performance of binary ML models predictive of ICU mortality with the performance of severity of illness score models on the same data sets. We assessed the studies' characteristics, synthesized the results, meta-analyzed the discriminative performance of the ML and severity of illness score models, and performed tests of heterogeneity within and among studies.ResultsWe screened 461 abstracts, of which we assessed the full text of 66 (14.3%) articles. We included in the review 20 (4.3%) studies that developed 47 ML models based on 7 types of algorithms and compared them with 3 types of the severity of illness score models. Of the 20 studies, 4 (20%) were found to have a low risk of bias and applicability in model development, 7 (35%) performed external validation, 9 (45%) reported on calibration, 12 (60%) reported on classification measures, and 4 (20%) addressed explainability. The discriminative performance of the ML-based models, which was reported as AUROC, ranged between 0.728 and 0.99 and between 0.58 and 0.86 for the severity of illness score–based models. We noted substantial heterogeneity among the reported models and considerable variation among the AUROC estimates for both ML and severity of illness score model types.ConclusionsML-based models can accurately predict ICU mortality as an alternative to traditional scoring models. Although the range of performance of the ML models is superior to that of the severity of illness score models, the results cannot be generalized due to the high degree of heterogeneity. When presented with the option of choosing between severity of illness score or ML models for decision support, clinicians should select models that have been externally validated, tested in the practice environment, and updated to the patient population and practice environment.Trial RegistrationPROSPERO CRD42021203871; https://tinyurl.com/28v2nch8

  • Research Article
  • Cite Count Icon 2
  • 10.1371/journal.pone.0307531
Prognosing post-treatment outcomes of head and neck cancer using structured data and machine learning: A systematic review.
  • Jul 24, 2024
  • PloS one
  • Mohammad Moharrami + 7 more

This systematic review aimed to evaluate the performance of machine learning (ML) models in predicting post-treatment survival and disease progression outcomes, including recurrence and metastasis, in head and neck cancer (HNC) using clinicopathological structured data. A systematic search was conducted across the Medline, Scopus, Embase, Web of Science, and Google Scholar databases. The methodological characteristics and performance metrics of studies that developed and validated ML models were assessed. The risk of bias was evaluated using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Out of 5,560 unique records, 34 articles were included. For survival outcome, the ML model outperformed the Cox proportional hazards model in time-to-event analyses for HNC, with a concordance index of 0.70-0.79 vs. 0.66-0.76, and for all sub-sites including oral cavity (0.73-0.89 vs. 0.69-0.77) and larynx (0.71-0.85 vs. 0.57-0.74). In binary classification analysis, the area under the receiver operating characteristics (AUROC) of ML models ranged from 0.75-0.97, with an F1-score of 0.65-0.89 for HNC; AUROC of 0.61-0.91 and F1-score of 0.58-0.86 for the oral cavity; and AUROC of 0.76-0.97 and F1-score of 0.63-0.92 for the larynx. Disease-specific survival outcomes showed higher performance than overall survival outcomes, but the performance of ML models did not differ between three- and five-year follow-up durations. For disease progression outcomes, no time-to-event metrics were reported for ML models. For binary classification of the oral cavity, the only evaluated subsite, the AUROC ranged from 0.67 to 0.97, with F1-scores between 0.53 and 0.89. ML models have demonstrated considerable potential in predicting post-treatment survival and disease progression, consistently outperforming traditional linear models and their derived nomograms. Future research should incorporate more comprehensive treatment features, emphasize disease progression outcomes, and establish model generalizability through external validations and the use of multicenter datasets.

  • Research Article
  • 10.1109/embc58623.2025.11252666
Enhancing Colon Cancer Risk Prediction in Machine Learning Models using Polygenic Risk Scores.
  • Jul 1, 2025
  • Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
  • Kyungbeom Kim + 5 more

Colon cancer is one of the deadliest types of cancer in the United States, with close to 50,000 projected deaths in 2024. The disease requires early diagnosis to optimize chances of survival by enabling timely administration of treatment. To investigate the key non-genetic (NG) factors influencing the onset of colon cancer and evaluate how genetic factors enhance the performance of machine learning (ML) models in predicting incidence, we incorporated polygenic risk scores (PRSs) alongside NG data in ML models to predict 10-year incident risk prediction of colon cancer using data from the UK Biobank. This approach enabled us to assess the added predictive value of PRSs in multi-modal models in estimating the 10-year risk of developing colon cancer over NG data alone. Moreover, our research focused on identifying the most relevant and predictive PRS and validating them using a robust ML framework. To ensure the robustness, we restricted the cohort to White British individuals to minimize ancestry-related heterogeneity. PRSs have proven effective in enhancing disease prediction for conditions such as breast cancer, myocardial infarction, and schizophrenia, reinforcing their relevance in clinical research. Exploring six PRSs, our goal was to minimize false negatives while simultaneously maximizing area under the receiver-operating characteristic curve (AUC), in order to improve early detection rates by identifying those who are at risk for colon cancer. This research shows that PRSs can be used to enhance overall predictive ability of ML models in colon cancer research over NG factors alone, bolstering the argument for incorporating PRSs into routine clinical practice. PRSs can also help minimize false negatives, a key feature for disease prediction models, as missed potential diagnoses are life-threatening.

  • Research Article
  • Cite Count Icon 3
  • 10.14309/ctg.0000000000000705
Machine Learning-Based Prediction Models for Clostridioides difficile Infection: A Systematic Review.
  • Jun 1, 2024
  • Clinical and translational gastroenterology
  • Raseen Tariq + 5 more

Despite research efforts, predicting Clostridioides difficile incidence and its outcomes remains challenging. The aim of this systematic review was to evaluate the performance of machine learning (ML) models in predicting C. difficile infection (CDI) incidence and complications using clinical data from electronic health records. We conducted a comprehensive search of databases (OVID, Embase, MEDLINE ALL, Web of Science, and Scopus) from inception up to September 2023. Studies employing ML techniques for predicting CDI or its complications were included. The primary outcome was the type and performance of ML models assessed using the area under the receiver operating characteristic curve. Twelve retrospective studies that evaluated CDI incidence and/or outcomes were included. The most commonly used ML models were random forest and gradient boosting. The area under the receiver operating characteristic curve ranged from 0.60 to 0.81 for predicting CDI incidence, 0.59 to 0.80 for recurrence, and 0.64 to 0.88 for predicting complications. Advanced ML models demonstrated similar performance to traditional logistic regression. However, there was notable heterogeneity in defining CDI and the different outcomes, including incidence, recurrence, and complications, and a lack of external validation in most studies. ML models show promise in predicting CDI incidence and outcomes. However, the observed heterogeneity in CDI definitions and the lack of real-world validation highlight challenges in clinical implementation. Future research should focus on external validation and the use of standardized definitions across studies.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.ins.2023.01.072
Data-driven evolutionary multi-task optimization for problems with complex solution spaces
  • Jan 13, 2023
  • Information Sciences
  • Chao Lyu + 2 more

Data-driven evolutionary multi-task optimization for problems with complex solution spaces

  • Research Article
  • Cite Count Icon 17
  • 10.2139/ssrn.3774075
Understanding the Performance of Machine Learning Models to Predict Credit Default: A Novel Approach for Supervisory Evaluation
  • Jan 27, 2021
  • SSRN Electronic Journal
  • Andrés Alonso + 1 more

In this paper we study the performance of several machine learning (ML) models for credit default prediction. We do so by using a unique and anonymized database from a major Spanish bank. We compare the statistical performance of a simple and traditionally used model like the Logistic Regression (Logit), with more advanced ones like Lasso penalized logistic regression, Classification And Regression Tree (CART), Random Forest, XGBoost and Deep Neural Networks. Following the process deployed for the supervisory validation of Internal Rating-Based (IRB) systems, we examine the benefits of using ML in terms of predictive power, both in classification and calibration. Running a simulation exercise for different sample sizes and number of features we are able to isolate the information advantage associated to the access to big amounts of data, and measure the ML model advantage. Despite the fact that ML models outperforms Logit both in classification and in calibration, more complex ML algorithms do not necessarily predict better. We then translate this statistical performance into economic impact. We do so by estimating the savings in regulatory capital when using ML models instead of a simpler model like Lasso to compute the risk-weighted assets. Our benchmark results show that implementing XGBoost could yield savings from 12.4% to 17% in terms of regulatory capital requirements under the IRB approach. This leads us to conclude that the potential benefits in economic terms for the institutions would be significant and this justify further research to better understand all the risks embedded in ML models.

  • Research Article
  • 10.1093/jamiaopen/ooae157
Evaluating dimensionality reduction of comorbidities for predictive modeling in individuals with neurofibromatosis type 1.
  • Dec 26, 2024
  • JAMIA open
  • Aditi Gupta + 7 more

Dimensionality reduction techniques aim to enhance the performance of machine learning (ML) models by reducing noise and mitigating overfitting. We sought to compare the effect of different dimensionality reduction methods for comorbidity features extracted from electronic health records (EHRs) on the performance of ML models for predicting the development of various sub-phenotypes in children with Neurofibromatosis type 1 (NF1). EHR-derived data from pediatric subjects with a confirmed clinical diagnosis of NF1 were used to create 10 unique comorbidities code-derived feature sets by incorporating dimensionality reduction techniques using raw International Classification of Diseases codes, Clinical Classifications Software Refined, and Phecode mapping schemes. We compared the performance of logistic regression, XGBoost, and random forest models utilizing each feature set. XGBoost-based predictive models were most successful at predicting NF1 sub-phenotypes. Overall, features based on domain knowledge-informed mapping schema performed better than unsupervised feature reduction methods. High-level features exhibited the worst performance across models and outcomes, suggesting excessive information loss with over-aggregation of features. Model performance is significantly impacted by dimensionality reduction techniques and varies by specific ML algorithm and outcome being predicted. Automated methods using existing knowledge and ontology databases can effectively aggregate features extracted from EHRs. Dimensionality reduction through feature aggregation can enhance the performance of ML models, particularly in high-dimensional datasets with small sample sizes, commonly found in EHRs health applications. However, if not carefully optimized, it can lead to information loss and data oversimplification, potentially adversely affecting model performance.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.