Tree-based Models Research Articles

Background: External comparisons of antimicrobial use (AU) may be more informative if adjusted for encounter characteristics. Optimal methods to define input variables for encounter-level risk-adjustment models of AU are not established. Methods: This retrospective analysis of electronic health record data included 50 US hospitals in 2020-2021. We used NHSN definitions for all antibacterials days of therapy (DOT), including adult and pediatric encounters with at least 1 day present in inpatient locations. We assessed 4 methods to define input variables: 1) diagnosis-related group (DRG) categories by Yu et al., 2) adjudicated Elixhauser comorbidity categories by Goodman et al., 3) all Clinical Classification Software Refined (CCSR) diagnosis and procedure categories, and 4) adjudicated CCSR categories where codes not appropriate for AU risk-adjustment were excluded by expert consensus, requiring review of 867 codes over 4 months to attain consensus. Data were split randomly, stratified by bed size as follows: 1) training dataset including two-thirds of encounters among two-thirds of hospitals; 2) internal testing set including one-third of encounters within training hospitals, and 3) external testing set including the remaining one-third of hospitals. We used a gradient-boosted machine (GBM) tree-based model and two-staged approach to first identify encounters with zero DOT, then estimate DOT among those with >0.5 probability of receiving antibiotics. Accuracy was assessed using mean absolute error (MAE) in testing datasets. Correlation plots compared model estimates and observed DOT among testing datasets. The top 20 most influential variables were defined using modeled variable importance. Results: Our datasets included 629,445 training, 314,971 internal testing, and 419,109 external testing encounters. Demographic data included 41% male, 59% non-Hispanic White, 25% non-Hispanic Black, 9% Hispanic, and 5% pediatric encounters. DRG was missing in 29% of encounters. MAE was lower in pediatrics as compared to adults, and lowest for models incorporating CCSR inputs (Figure 1). Performance in internal and external testing was similar, though Goodman/Elixhauser variable strategies were less accurate in external testing and underestimated long DOT outliers (Figure 2). Agnostic and adjudicated CCSR model estimates were highly correlated; their influential variables lists were similar (Figure 3). Conclusion: Larger numbers of CCSR diagnosis and procedure inputs improved risk-adjustment model accuracy compared with prior strategies. Variable importance and accuracy were similar for agnostic and adjudicated approaches. However, maintaining adjudications by experts would require significant time and potentially introduce personal bias. If findings are confirmed, the need for expert adjudication of input variables should be reconsidered.Disclosure: Elizabeth Dodds Ashley: Advisor- HealthTrackRx. David J Weber: Consultant on vaccines: Pfizer; DSMB chair: GSK; Consultant on disinfection: BD, GAMA, PDI, Germitec

Read full abstract

The field of machine learning has been evolving and applied in medical applications. We utilised a public dataset, MIMIC-III, to develop compact models that can accurately predict the outcome of mechanically ventilated patients in the first 24 h of first-time hospital admission. 67 predictive features, grouped into 6 categories, were selected for the classification and prediction task. 4 tree-based algorithms (Decision Tree, Bagging, eXtreme Gradient Boosting and Random Forest), and 5 non-tree-based algorithms (Logistic Regression, K-Nearest Neighbour, Linear Discriminant Analysis, Support Vector Machine and Naïve Bayes), were employed to predict the outcome of 18,883 mechanically ventilated patients. 5 scenarios were crafted to mirror the target population as per existing literature. S1.1 reflected an imbalanced situation, with significantly fewer mortality cases than survival ones, and both the training and test sets played similar target class distributions. S1.2 and S2.2 featured balanced classes; however, instances from the majority class were removed from the test set and/or the training set. S1.3 and S 2.3 generated additional instances of the minority class via the Synthetic Minority Over-sampling Technique. Standard evaluation metrics were used to determine the best-performing models for each scenario. With the best performers, Autofeat, an automated feature engineering library, was used to eliminate less important features per scenario. Tree-based models generally outperformed the non-tree-based ones. Moreover, XGB consistently yielded the highest AUC score (between 0.91 and 0.97), while exhibiting relatively high Sensitivity (between 0.58 and 0.88) on 4 scenarios (1.2, 2.2, 1.3, and 2.3). After reducing a significant number of predictors, the selected calibrated ML models were still able to achieve similar AUC and MCC scores across those scenarios. The calibration curves of the XGB and BG models, both prior to and post dimension reduction in Scenario 2.2, showed better alignment to the perfect calibration line than curves produced from other algorithms. This study demonstrated that dimension-reduced models can perform well and are able to retain the important features for the classification tasks. Deploying a compact machine learning model into production helps reduce costs in terms of computational resources and monitoring changes in input data over time.

Read full abstract

Tree-based Models Research Articles

Related Topics

Articles published on Tree-based Models

Uncovering phishing attacks using principles of persuasion analysis

A Comparison of Variable Input Strategies used for Risk-adjustment Models of Antimicrobial Use

Gradient Boosting Decision Tree-Based PMM Model Integrated Into FDTD Method for Solving Subsurface Sensing Problems

A Tree-Based World Model for Reducing System Complexity in Autonomous Mobile Manipulation

Efficacy of Tree-Based Models for Pipe Failure Prediction and Condition Assessment: A Comprehensive Review

Probabilistic and explainable tree-based models for rotational reactionary flight delay prediction

A new integrated framework to fault detection and diagnosis of air handling unit: Emphasizing the impact of symptoms

Molecular delimitation of cryptic Australian squid species of the genus Uroteuthis Rehder, 1945 (Cephalopoda: Loliginidae), provides a baseline of diversity to resolve classification challenges throughout the Indo-Pacific

Data analysis and machine learning aided integrated catalyst activity and process modelling for selective H2 production from biomass gasification

Machine learning analysis of thermophysical and thermohydraulic properties in ethylene glycol- and glycerol-based SiO2 nanofluids

Temporal heterogeneity in the performance of machine learning models for PM2.5 concentration estimation

A data quality management framework for equipment failure risk estimation: Application to the oil and gas industry

A tree-based explainable AI model for early detection of Covid-19 using physiological data

PnT: Born-again tree-based model via fused decision path encoding

Compact machine learning model for the accurate prediction of first 24-hour survival of mechanically ventilated patients.

Deriving PM2.5 from satellite observations with spatiotemporally weighted tree-based algorithms: enhancing modeling accuracy and interpretability

SpChar: Characterizing the sparse puzzle via decision trees

Predicting the PCM-incorporated building's performance using optimized linear kernel and tree-based machine learning methods

Light curve classification with DistClassiPy: A new distance-based classifier

Assessing and analysing energy system balance: A decision tree approach

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Tree-based Models Research Articles

Related Topics

Articles published on Tree-based Models

Uncovering phishing attacks using principles of persuasion analysis

A Comparison of Variable Input Strategies used for Risk-adjustment Models of Antimicrobial Use

Gradient Boosting Decision Tree-Based PMM Model Integrated Into FDTD Method for Solving Subsurface Sensing Problems

A Tree-Based World Model for Reducing System Complexity in Autonomous Mobile Manipulation

Efficacy of Tree-Based Models for Pipe Failure Prediction and Condition Assessment: A Comprehensive Review

Probabilistic and explainable tree-based models for rotational reactionary flight delay prediction

A new integrated framework to fault detection and diagnosis of air handling unit: Emphasizing the impact of symptoms

Molecular delimitation of cryptic Australian squid species of the genus Uroteuthis Rehder, 1945 (Cephalopoda: Loliginidae), provides a baseline of diversity to resolve classification challenges throughout the Indo-Pacific

Data analysis and machine learning aided integrated catalyst activity and process modelling for selective H2 production from biomass gasification

Machine learning analysis of thermophysical and thermohydraulic properties in ethylene glycol- and glycerol-based SiO2 nanofluids

Temporal heterogeneity in the performance of machine learning models for PM2.5 concentration estimation

A data quality management framework for equipment failure risk estimation: Application to the oil and gas industry

A tree-based explainable AI model for early detection of Covid-19 using physiological data

PnT: Born-again tree-based model via fused decision path encoding

Compact machine learning model for the accurate prediction of first 24-hour survival of mechanically ventilated patients.

Deriving PM2.5 from satellite observations with spatiotemporally weighted tree-based algorithms: enhancing modeling accuracy and interpretability

SpChar: Characterizing the sparse puzzle via decision trees

Predicting the PCM-incorporated building's performance using optimized linear kernel and tree-based machine learning methods

Light curve classification with DistClassiPy: A new distance-based classifier

Assessing and analysing energy system balance: A decision tree approach