Introduction: Pediatric Cerebral Venous Sinus Thrombosis (CVST) is a rare but serious disorder with risk for neurologic sequelae and mortality. Prediction of health outcomes for risk stratification and tailored therapy is difficult based on available literature with single center studies and limited sample sizes. We sought to analyze the CVST data in Pediatric Health Information System (PHIS) database, a large, multicenter administrative database of pediatric inpatient data, to assess if supervised machine learning can assist in identifying pediatric CVST patients with increased risk of adverse outcomes. Methods: Patients 0 to 17 years of age with a diagnosis of CVST were identified in the PHIS database from 2011 to 2022. Demographic information, clinical characteristics, evaluation and management, and outcome data was extracted from diagnostic, imaging, procedure, and pharmaceutical billing codes. An XGBOOST model was fitted using 10-fold cross-validation with Bayesian optimization for hyperparameter tuning. The clinical parameters with values optimal for analysis from the “early era” (2011-2015) dataset were used to train for machine learning, and the test was applied to the “late era” (2016-2022) dataset. The final model was chosen based on the highest test set AUC score. The clinical parameters with significant SHAP values from the model were compared to published risk factors for mortality from an International Pediatric Stroke Study (IPSS) (RN Ichord, et al. Arch Dis Child 2015). Results: 6080 hospitalizations with CVST diagnosis (0.1% of all hospitalizations) were identified. The clinical parameters applied for machine learning included the following: age, gender, race, major complications (intracranial bleeding, hydrocephalus, coma), child opportunity index and ICU admission (Table 1.). The overall mortality rate at discharge was 4.2%. 2029 hospitalizations occurred in the early era and 4051 hospital admissions occurred in the late era. With the analyzable clinical parameters, the XGBOOST model could predict mortality with a probability of 0.7906. Of the clinical parameters used for machine learning (Figure 1), ICU admission, intracranial bleeding, hydrocephalus and coma had positive SHAP values indicating higher mortality risk prediction, whereas older age and being non-Hispanic white had negative SHAP values indicating lower mortality risk prediction. Similar risk associations for mortality were reported in the IPSS study, with patients with intracranial bleeding, younger age, and coma having higher mortality risk, consistent with our study finding. Conclusion: Our study in pediatric CVST, using machine learning, shows higher risk of mortality with ICU admission, and complications including intracranial bleeding, hydrocephalus, and coma; and lower risk of mortality with older age and non-Hispanic White status, which are consistent with IPSS study observations. Our study also illustrates how machine learning can be utilized by investigators to predict adverse clinical outcomes in pediatric CVST. In rare diseases which are difficult to study due to limited sample sizes like pediatric CVST, our study shows that data from large databases such as PHIS can be utilized for machine learning to predict health care outcomes, which can in turn inform future clinical practice to improve patient outcome.
Read full abstract