Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches.

Feichen Shen,Sijia Liu,Hongfang Liu,Liwei Wang,Andrew Wen,Yanshan Wang

doi:10.2196/11301

Abstract

BackgroundIn the United States, a rare disease is characterized as the one affecting no more than 200,000 patients at a certain period. Patients suffering from rare diseases are often either misdiagnosed or left undiagnosed, possibly due to insufficient knowledge or experience with the rare disease on the part of clinical practitioners. With an exponentially growing volume of electronically accessible medical data, a large volume of information on thousands of rare diseases and their potentially associated diagnostic information is buried in electronic medical records (EMRs) and medical literature.ObjectiveThis study aimed to leverage information contained in heterogeneous datasets to assist rare disease diagnosis. Phenotypic information of patients existed in EMRs and biomedical literature could be fully leveraged to speed up diagnosis of diseases.MethodsIn our previous work, we advanced the use of a collaborative filtering recommendation system to support rare disease diagnostic decision making based on phenotypes derived solely from EMR data. However, the influence of using heterogeneous data with collaborative filtering was not discussed, which is an essential problem while facing large volumes of data from various resources. In this study, to further investigate the performance of collaborative filtering on heterogeneous datasets, we studied EMR data generated at Mayo Clinic as well as published article abstracts retrieved from the Semantic MEDLINE Database. Specifically, in this study, we designed different data fusion strategies from heterogeneous resources and integrated them with the collaborative filtering model.ResultsWe evaluated performance of the proposed system using characterizations derived from various combinations of EMR data and literature, as well as with sole EMR data. We extracted nearly 13 million EMRs from the patient cohort generated between 2010 and 2015 at Mayo Clinic and retrieved all article abstracts from the semistructured Semantic MEDLINE Database that were published till the end of 2016. We applied a collaborative filtering model and compared the performance generated by different metrics. Log likelihood ratio similarity combined with k-nearest neighbor on heterogeneous datasets showed the optimal performance in patient recommendation with area under the precision-recall curve (PRAUC) 0.475 (string match), 0.511 (systematized nomenclature of medicine [SNOMED] match), and 0.752 (Genetic and Rare Diseases Information Center [GARD] match). Log likelihood ratio similarity also performed the best with mean average precision 0.465 (string match), 0.5 (SNOMED match), and 0.749 (GARD match). Performance of rare disease prediction was also demonstrated by using the optimal algorithm. Macro-average F-measure for string, SNOMED, and GARD match were 0.32, 0.42, and 0.63, respectively.ConclusionsThis study demonstrated potential utilization of heterogeneous datasets in a collaborative filtering model to support rare disease diagnosis. In addition to phenotypic-based analysis, in the future, we plan to further resolve the heterogeneity issue and reduce miscommunication between EMR and literature by mining genotypic information to establish a comprehensive disease-phenotype-gene network for rare disease diagnosis.

Highlights

BackgroundIn the United States, a rare disease is described as the one affecting no more than 200,000 patients at a certain time [1]
We evaluated performance of the proposed system using characterizations derived from various combinations of electronic medical records (EMRs) data and literature, as well as with sole EMR data
Log likelihood ratio similarity combined with k-nearest neighbor on heterogeneous datasets showed the optimal performance in patient recommendation with area under the precision-recall curve (PRAUC) 0.475, 0.511, and 0.752 (Genetic and Rare Diseases Information Center [GARD] match)

Summary

Introduction

BackgroundIn the United States, a rare disease is described as the one affecting no more than 200,000 patients at a certain time [1]. With computationally accessible medical data growing at an exponential rate, an abundance of rare disease-related phenotypic information is believed to be buried in electronic medical records (EMRs) and medical literature. We proposed the use of collaborative filtering in our previous study for rare disease diagnosis [5], as making diagnostic decision making for a patient based on phenotype is similar to recommending a similar online product according to customers’ previous purchases in e-commerce [6,7,8]. With an exponentially growing volume of electronically accessible medical data, a large volume of information on thousands of rare diseases and their potentially associated diagnostic information is buried in electronic medical records (EMRs) and medical literature

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR medical informatics	Publication Date: Oct 10, 2018
Citations: 30	License type: cc-by

R Discovery Prime

R Discovery Prime

Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

Illustrating the patient journey through the care continuum: Leveraging structured primary care electronic medical record (EMR) data in Ontario, Canada using chronic obstructive pulmonary disease as a case study
Jennifer Rayner ... Chen Wu
International journal of bio-medical computing | VOL. 140
Jennifer Rayner, et. al.Jennifer Rayner ... Chen Wu
19 May 2020
International journal of bio-medical computing | VOL. 140

Can Linked Electronic Medical Record and Administrative Data Help Us Identify Those Living with Frailty?
Sabrina Wong ... Carole Taylor
International Journal for Population Data Science | VOL. 5
Sabrina Wong, et. al.Sabrina Wong ... Carole Taylor
14 Oct 2020
International Journal for Population Data Science | VOL. 5

Identifying and categorizing spurious weight data in electronic medical records
Sunny Chen ... Stephen M Thielke
The American Journal of Clinical Nutrition | VOL. 107
Sunny Chen, et. al.Sunny Chen ... Stephen M Thielke
01 Mar 2018
The American Journal of Clinical Nutrition | VOL. 107

OP0010 Use of claims and electronic medical record data to predict ra disease activity
C.H Feldman ... M.E Weinblatt
Annals of the Rheumatic Diseases | VOL. 77
C.H Feldman, et. al.C.H Feldman ... M.E Weinblatt
01 Jun 2018
OP0010 Use of claims and electronic medical record data to predict ra disease activity
C.H Feldman ... M.E Weinblatt

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics