Redundancy-Aware Topic Modeling for Patient Record Notes

Raphael Cohen,Michael Elhadad,Noémie Elhadad,Iddo Aviram

doi:10.1371/journal.pone.0087555

Raphael Cohen, Michael Elhadad + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0087555

Copy DOI

Journal: PLoS ONE	Publication Date: Feb 13, 2014
Citations: 85	License type: CC BY 4.0

Affiliation: Ben-Gurion University of the Negev, Columbia University

Abstract

The clinical notes in a given patient record contain much redundancy, in large part due to clinicians’ documentation habit of copying from previous notes in the record and pasting into a new note. Previous work has shown that this redundancy has a negative impact on the quality of text mining and topic modeling in particular. In this paper we describe a novel variant of Latent Dirichlet Allocation (LDA) topic modeling, Red-LDA, which takes into account the inherent redundancy of patient records when modeling content of clinical notes. To assess the value of Red-LDA, we experiment with three baselines and our novel redundancy-aware topic modeling method: given a large collection of patient records, (i) apply vanilla LDA to all documents in all input records; (ii) identify and remove all redundancy by chosing a single representative document for each record as input to LDA; (iii) identify and remove all redundant paragraphs in each record, leaving partial, non-redundant documents as input to LDA; and (iv) apply Red-LDA to all documents in all input records. Both quantitative evaluation carried out through log-likelihood on held-out data and topic coherence of produced topics and qualitative assessement of topics carried out by physicians show that Red-LDA produces superior models to all three baseline strategies. This research contributes to the emerging field of understanding the characteristics of the electronic health record and how to account for them in the framework of data mining. The code for the two redundancy-elimination baselines and Red-LDA is made publicly available to the community.

Highlights

The information contained in the electronic health record for a given patient record is quite redundant
We describe a novel variant of Latent Dirichlet Allocation (LDA) topic modeling, redundancy-aware LDA (Red-LDA), which takes into account the inherent redundancy of clinical notes within a given patient record, and produces better topic models, as shown through quantitative and qualitative evaluation
To assess the value of handling redundancy explicitly as part of the topic modeling task of clinical notes, we conducted a comparison of the redundancy-aware LDA (Red-LDA) to alternative methods according to two quantitative established metrics for evaluation of topic modeling – log-likelihood and topic coherence – and a qualitative review of generated topics by clinical experts

Summary

Introduction

The information contained in the electronic health record for a given patient record is quite redundant. We have shown through a quantitative analysis that redundancy hurts standard text-mining tools, such as collocation identification and topic modeling [3]. Topic Modeling with Latent Dirichlet Allocation (LDA) [4] is a popular unsupervised method for discovering latent semantic properties of a document collection. Topic modeling has been shown to help in large number of tasks, including document classification and clustering, multi-document summarization [5], search [6], document labeling [7,8], and information extraction [9]. The measure of LDAs sensitivity to different kinds of noise is not well understood, especially as various methods are used for evaluating the produced topic models [11,12]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Redundancy-Aware Topic Modeling for Patient Record Notes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain
Renu Sabharwal ... Shah J Miah
Journal of Big Data | VOL. 9
Renu Sabharwal, et. al.Renu Sabharwal ... Shah J Miah
28 Apr 2022
Journal of Big Data | VOL. 9

Sentiment Analysis of Consumer-Generated Online Reviews of Physical Bookstores Using Hybrid LSTM-CNN and LDA Topic Model
Yan Wang ... Xiaoyu Chang
-
Yan Wang, et. al.Yan Wang ... Xiaoyu Chang
01 Oct 2020
01 Oct 2020

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence
Liangxi Qin ... Minglai Shao
-
Liangxi Qin, et. al.Liangxi Qin ... Minglai Shao
01 Jan 2014
01 Jan 2014

Analyze IMDb movies by sentiment and topic analysis
Ningjing Ouyang
Environment and Social Psychology | VOL. 8
Ningjing OuyangNingjing Ouyang
25 Oct 2023
Environment and Social Psychology | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Redundancy-Aware Topic Modeling for Patient Record Notes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE