Abstract Introduction Clinical data repositories (CDR) including electronic health record (EHR) data have great potential for outcome prediction and risk modeling. However, most CDRs were only used for data displaying, and using data from CDR for outcome prediction often requires careful study design and sophisticated modeling techniques before a hypothesis can be tested. Purpose We built a prediction tool integrated with CDR based on pattern discovery aiming to bridge the above gap and demonstrated a case study on contrast related acute kidney injury (AKI) with the system. Methods A cardiovascular CDR integrated with multiple hospital informatics systems was established. For the case study on AKI, we included patients undergoing cardiac catheterization from January 13, 2015 to April 27, 2017, excluding those with dialysis, end-stage renal disease, renal transplant, and missing pre- or post-procedural creatinine. To handle missing data, a prior-history-note composer was designed to fill in structured data of 14 diseases related to cardiovascular problem. Crucial data such as ejective fraction was extracted from the structured reports. AKI was defined according to Acute Kidney Injury Network by increase of serum creatinine from most recent baseline to the post-procedure 7-day peak. To build predictive modeling, we selected 17 variables covered in existing AKI models. Pattern discovery was recently developed as an interpretable predictive model which works on incomplete noisy data. In this study, we developed a pattern discovery based visual analytics tool, and trained it on 70% data up to August 2016 with three interactive knowledge incorporation modes to develop 3 models: 1) pure data-driven, 2) domain knowledge, and 3) clinician-interactive. In last two modes, a physician using the visual analytics could change the variables and further refine the model, respectively. We tested and compared it with other models on the 30% consecutive patients dated afterwards, which is shown in Figure 1. Results Among 2,560 patients in the final dataset with 17 pre-procedure variables derived from CDR data, 169 (7.3%) had AKI. We measured 4 existing models, whose areas under curves (AUCs) of receiver operating characteristics curve for the test set were 0.70 (Mehran's), 0.72 (Chen's), 0.67 (Gao's) and 0.62 (AGEF), respectively. A pure data-driven machine learning method achieves AUC of 0.72 (Easy Ensemble). The AUCs of our 3 models are 0.77, 0.80, 0.82, respectively, with the last being top where physician knowledge is incorporated. Demo and demonstration Conclusions We developed a novel pattern-discovery-based outcome prediction tool integrated with CDR and purely using EHR data. On the case of predicting contrast related AKI, the tool showed user-friendliness by physicians, and demonstrated a competitive performance in comparison with the state-of-the-art models.
Read full abstract