Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study.

Sajit Kumar,Vaibhav Rajan,Alicia Nanelia,Ragunathan Mariappan,Adithya Rajagopal

doi:10.2196/28842

Sajit Kumar, Vaibhav Rajan + Show 3 more

Open Access

https://doi.org/10.2196/28842

Copy DOI

Export

Save

Cite

Journal: JMIR medical informatics	Publication Date: Jan 20, 2022
Citations: 2	License type: cc-by

Abstract
Full-Text
Similar Papers

Abstract

Listen

BackgroundPatient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network–based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals.ObjectiveThis study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks.MethodsUsing a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set.ResultsOur experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance.ConclusionsOur experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study.

Abstract

Published Version

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

Illustrating the patient journey through the care continuum: Leveraging structured primary care electronic medical record (EMR) data in Ontario, Canada using chronic obstructive pulmonary disease as a case study
Jennifer Rayner ... Chen Wu
International Journal of Medical Informatics | VOL. 140
Jennifer Rayner, et. al.Jennifer Rayner ... Chen Wu
19 May 2020
International Journal of Medical Informatics | VOL. 140

Can Linked Electronic Medical Record and Administrative Data Help Us Identify Those Living with Frailty?
Sabrina Wong ... Alexander Singer
International journal of population data science | VOL. 5
Sabrina Wong, et. al.Sabrina Wong ... Alexander Singer
14 Oct 2020
International journal of population data science | VOL. 5

OP0010 Use of claims and electronic medical record data to predict ra disease activity
C.H Feldman ... M.E Weinblatt
Annals of the Rheumatic Diseases | VOL. 77
C.H Feldman, et. al.C.H Feldman ... M.E Weinblatt
01 Jun 2018
OP0010 Use of claims and electronic medical record data to predict ra disease activity
C.H Feldman ... M.E Weinblatt

Chronic Disease Case Definitions for Electronic Medical Records: A Canadian Validation Study
Lisa Lix ... Alexander Singer
International Journal of Population Data Science | VOL. 1
Lisa Lix, et. al.Lisa Lix ... Alexander Singer
18 Apr 2017
International Journal of Population Data Science | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study.

Abstract

Published Version

Talk to us

Similar Papers

More From: JMIR medical informatics