Introduction Venous thromboembolism (VTE) as a hospital-acquired condition (HAC) – i.e. not ‘present on admission’ (POA) – is a potentially preventable complication. A decrease of HAC VTE events indicates success of efforts to prevent VTE in hospitalized patients. However, so far, costly chart reviews were needed to identify patients with HAC VTE. We investigated whether electronic health record data such as medication orders and their temporal relations allow for differentiating between HAC and POA. Therefore, we modeled a tree and two random forests and evaluated the automated classification of HAC VTE. Methods All inpatients with a length of stay of ≥24 hours (h), discharged from the Brigham and Women’s Hospital, a large tertiary care hospital in Boston, MA, between January 2009 and April 2014 were searched for ICD-9 diagnosis codes of acute venous thrombosis or pulmonary embolism. Patients were included who had VTE in the admitting diagnosis field – defined as POA VTE – or in one of up to 50 discharge diagnoses. Of those, only patients who received heparin, dalteparin, enoxaparin, alteplase, rivaroxaban or fondaparinux were considered, and the time from admission to the first order was calculated for each drug. Additionally included predictors: dose information, demographics (age, gender, race, language), length of stay, admission service, discharge service, transfer destination of the patient after discharge, and whether the patient was alive or died during the hospitalization or within 30 days after discharge. A single tree and two random forests (each with 5,000 trees) were generated to analyze the predictors and to assess the predictive power of the chosen approach. Since medication orders are electronically available in real time, such prospective predictors may have implications for clinical decision support – therefore, prospective predictors (i.e. demographics, admission service, time to order a drug, route and dose information for each drug) were separately analyzed in the first random forest. Half of the data served as calibration set, half as validation set. Statistical computing was performed using the software R version 3.1.0 (R Foundation for Statistical Computing, Vienna, Austria). Results A total of 5,374 patient stays featured a VTE diagnosis with a defined drug order. If VTE was POA (n=1,262; 23.5%), the median time to order one of the aforementioned drugs was 2.5h (IQR 1.3-5.0h). Among HAC VTE cases without an admitting diagnosis of VTE (n=4,112; 76.5%), the median time to order the drug was 4.2h (IQR 1.7-18.2h). Unsurprisingly, a single tree – after cross-validation and pruning – identified the time from admission to the ordering of intravenous (IV) heparin as the most significant predictor (Fig. 1). This tree’s validation resulted in an accuracy of 78.8% and a positive predictive value (PPV) of 83.3% for the classification of HAC VTE. The first validated random forest used predictors which are available in real time: the forest had an accuracy of 79.7% and a PPV of 85.3% for the classification of HAC VTE. The second validated random forest considered all variables and resulted in an accuracy of 81.7% and a PPV of 87.8% (variables’ importance is shown in Fig. 2). Discussion We modeled a tree and two random forests using structured data predictors to differentiate between HAC and POA VTE. Our validated tree (Fig. 1), considering the first order for IV heparin and the length of stay, could immediately be implemented as a first step to identifying HAC VTE patients. However, the random forests performed better, even when exclusively prospective predictors were used – and such real time models may have implications for clinical decision support tools. In conclusion, our random forests could help to evaluate interventions to improve thromboprophylaxis regimens for inpatients, where costly chart reviews are needed to differentiate between POA VTE and potentially preventable complications.
Read full abstract