Learning temporal weights of clinical events using variable importance.

Jing Zhao,Aron Henriksson

doi:10.1186/s12911-016-0311-6

Abstract

BackgroundLongitudinal data sources, such as electronic health records (EHRs), are very valuable for monitoring adverse drug events (ADEs). However, ADEs are heavily under-reported in EHRs. Using machine learning algorithms to automatically detect patients that should have had ADEs reported in their health records is an efficient and effective solution. One of the challenges to that end is how to take into account the temporality of clinical events, which are time stamped in EHRs, and providing these as features for machine learning algorithms to exploit. Previous research on this topic suggests that representing EHR data as a bag of temporally weighted clinical events is promising; however, the weights were in that case pre-assigned according to their time stamps, which is limited and potentially less accurate. This study therefore focuses on how to learn weights that effectively take into account the temporality and importance of clinical events for ADE detection.MethodsVariable importance obtained from the random forest learning algorithm is used for extracting temporal weights. Two strategies are proposed for applying the learned weights: weighted aggregation and weighted sampling. The first strategy aggregates the weighted clinical events from different time windows to form new features; the second strategy retains the original features but samples them by using their weights as probabilities when building each tree in the forest. The predictive performance of random forest models using the learned weights with the two strategies is compared to using pre-assigned weights. In addition, to assess the sensitivity of the weight-learning procedure, weights from different granularity levels are evaluated and compared.ResultsIn the weighted sampling strategy, using learned weights significantly improves the predictive performance, in comparison to using pre-assigned weights; however, there is no significant difference between them in the weighted aggregation strategy. Moreover, the granularity of the weight learning procedure has a significant impact on the former, but not on the latter.ConclusionsLearning temporal weights is significantly beneficial in terms of predictive performance with the weighted sampling strategy. Moreover, weighted aggregation generally diminishes the impact of temporal weighting of the clinical events, irrespective of whether the weights are pre-assigned or learned.

Highlights

Longitudinal data sources, such as electronic health records (EHRs), are very valuable for monitoring adverse drug events (ADEs)
The experiments are conducted on 19 datasets, each one corresponding to a specific ADE, that are extracted from the Stockholm EPR Corpus, a Swedish EHR database
In this study, the scheme of learning temporal weights of clinical events, including two ways of applying the learned weights, i.e., Weighted Aggregation and Weighted Sampling, was evaluated. The former aggregates the same clinical event from different time windows in accordance to the learned weights, while the latter exploits the learned weights as sampling probabilities of the clinical events when constructing each tree in the random forest

Summary

Introduction

Longitudinal data sources, such as electronic health records (EHRs), are very valuable for monitoring adverse drug events (ADEs). One of the challenges to that end is how to take into account the temporality of clinical events, which are time stamped in EHRs, and providing these as features for machine learning algorithms to exploit. Screening millions of health records to identify ADEs is practically impossible for the massive amounts of data archived in an EHR database To mitigate this problem, supervised machine learning can be adopted to automatically detect the presence of an ADE in health records in which it was not but should have been reported [8,9,10,11,12,13,14]. These clinical events are reported in a chronological order in EHRs and the same event often appears in the same health record several times at different time points

Objectives

Methods

Results

Discussion

Conclusion