Ensembles of randomized trees using diverse distributed representations of clinical events.

Aron Henriksson,Hercules Dalianis,Jing Zhao,Henrik Boström

doi:10.1186/s12911-016-0309-0

Aron Henriksson, Hercules Dalianis + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/s12911-016-0309-0

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundLearning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events – modeled in an ensemble of semantic spaces – for the purpose of predictive modeling.MethodsThree different ways of exploiting a set of (ten) distributed representations of four types of clinical events – diagnosis codes, drug codes, measurements, and words in clinical notes – are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces – corresponding to the considered data types – of a given context window size.ResultsThe proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases.ConclusionsThe strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy – significantly outperforming the considered alternatives – involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.

Highlights

Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations
We have previously proposed a means of representing heterogeneous data types by first learning deep representations of clinical events based on their distribution in electronic health record (EHR)
We investigate alternative ways of making use of semantic space ensembles in conjunction with ensemble methods bagging and random subspacing used in the random forest learning algorithm

Summary

Introduction

Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The high dimensionality of the data, in turn, typically renders it extremely sparse since patients, within a given care episode, are only exposed to a very small subset of the clinical events used for describing the training sample This is known as the curse of dimensionality and makes it difficult to apply statistical methods to healthcare data. Structured EHR data includes diagnosis codes (in the form of, e.g., ICD), drug codes (in the form of, e.g., ATC) and measurements (typically in the form of institutionspecific encoding) Using these data types inevitably gives rise to questions of representation, how to handle values missing at random or not, and how to take into account the temporality of clinical events. These issues have been addressed in a number of studies [2,3,4,5,6,7]

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Jul 1, 2016
Citations: 23	License type: cc-by

R Discovery Prime

Ensembles of randomized trees using diverse distributed representations of clinical events.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

Modeling heterogeneous clinical sequence data in semantic space for adverse drug event detection
Aron Henriksson ... Henrik Bostrom
-
Aron Henriksson, et. al.Aron Henriksson ... Henrik Bostrom
01 Jan 2015
01 Jan 2015

Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection
Aron Henriksson ... Henrik Bostrom
-
Aron Henriksson, et. al.Aron Henriksson ... Henrik Bostrom
01 Jan 2015
01 Jan 2015

Episodic memory in nonhuman animals?
Jonathon D Crystal ... Thomas Suddendorf
Current Biology | VOL. 29
Jonathon D Crystal, et. al.Jonathon D Crystal ... Thomas Suddendorf
01 Dec 2019
Current Biology | VOL. 29

Biomarkers Identification of Hepatocellular Carcinoma Based on Multiomics Data Integration and Graph-embedded Deep Neural Network
Chaokun Yan ... Huimin Luo
Current Bioinformatics | VOL. 18
Chaokun Yan, et. al.Chaokun Yan ... Huimin Luo
01 Jul 2023
Current Bioinformatics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Ensembles of randomized trees using diverse distributed representations of clinical events.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making