Abstract

The use of data analysis techniques in electronic health records (EHRs) offers great promise in improving predictive risk modeling. Although useful, these analysis techniques often suffer from a lack of interpretability and transparency, especially when the data is high-dimensional. The emergence of a type of computational system known as visual analytics has the potential to address these issues by integrating data analysis techniques with interactive visualizations. This paper introduces a visual analytics system called VERONICA that utilizes the natural classification of features in EHRs to identify the group of features with the strongest predictive power. VERONICA incorporates a representative set of supervised machine learning techniques—namely, classification and regression tree, C5.0, random forest, support vector machines, and naive Bayes to support users in developing predictive models using EHRs. It then makes the analytics results accessible through an interactive visual interface. By integrating different sampling strategies, analytics algorithms, visualization techniques, and human-data interaction, VERONICA assists users in comparing prediction models in a systematic way. To demonstrate the usefulness and utility of our proposed system, we use the clinical dataset stored at ICES to identify the best representative feature groups in detecting patients who are at high risk of developing acute kidney injury.

Highlights

  • A key component of precision medicine is to determine a person’s individualized estimates of different health outcomes, which guides therapy to increase the chance of long-term good health

  • The Analytics module utilizes the group structure of features stored in electronic health records (EHRs) to identify the subset of feature groups that best represent the data in the prediction of Acute Kidney Injury (AKI)

  • The Analytics module utilizes the group structure of features stored in EHRs to identify the subset of feature groups that best represent the data in the prediction of module into interactive visual representations to assist users in exploring the results

Read more

Summary

Introduction

A key component of precision medicine is to determine a person’s individualized estimates of different health outcomes, which guides therapy to increase the chance of long-term good health. Most of the existing studies use unsupervised learning techniques such as principal component analysis [6], K-means [7,8], and hierarchical clustering [9] to find the best representative group of features in high dimensional EHRs [10–18]. These unsupervised techniques have shown promise in managing high dimensional data, to our best knowledge, this problem has not been studied thoroughly using supervised techniques [19,20]. To identify the subset with the most substantial predictive power, VERONICA considers every possible subset of groups (i.e., groups of features) and applies several supervised learning techniques to each subset.

Background
Visual Analytics
Analytics Module
Interactive Visualization Module
Decision Tree
Support Vector Machines
Naive Bayes
Class Imbalance Problem
Related Work
Design Process and Participants
Data Sources
Cohort Entry Criteria
Response Variable
Input Features
Implementation Details
Workflow
The Design of VERONICA
Analytics
Limitations
Conclusion and Future Work
Discharge Abstract
Findings

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.