Leveraging medical context to recommend semantically similar terms for chart reviews

Cheng Ye,Bradley A Malin,Daniel Fabbri

doi:10.1186/s12911-021-01724-2

Abstract

BackgroundInformation retrieval (IR) help clinicians answer questions posed to large collections of electronic medical records (EMRs), such as how best to identify a patient’s cancer stage. One of the more promising approaches to IR for EMRs is to expand a keyword query with similar terms (e.g., augmenting cancer with mets). However, there is a large range of clinical chart review tasks, such that fixed sets of similar terms is insufficient. Current language models, such as Bidirectional Encoder Representations from Transformers (BERT) embeddings, do not capture the full non-textual context of a task. In this study, we present new methods that provide similar terms dynamically by adjusting with the context of the chart review task.MethodsWe introduce a vector space for medical-context in which each word is represented by a vector that captures the word’s usage in different medical contexts (e.g., how frequently cancer is used when ordering a prescription versus describing family history) beyond the context learned from the surrounding text. These vectors are transformed into a vector space for customizing the set of similar terms selected for different chart review tasks. We evaluate the vector space model with multiple chart review tasks, in which supervised machine learning models learn to predict the preferred terms of clinically knowledgeable reviewers. To quantify the usefulness of the predicted similar terms to a baseline of standard word2vec embeddings, we measure (1) the prediction performance of the medical-context vector space model using the area under the receiver operating characteristic curve (AUROC) and (2) the labeling effort required to train the models.ResultsThe vector space outperformed the baseline word2vec embeddings in all three chart review tasks with an average AUROC of 0.80 versus 0.66, respectively. Additionally, the medical-context vector space significantly reduced the number of labels required to learn and predict the preferred similar terms of reviewers. Specifically, the labeling effort was reduced to 10% of the entire dataset in all three tasks.ConclusionsThe set of preferred similar terms that are relevant to a chart review task can be learned by leveraging the medical context of the task.

Highlights

Information retrieval (IR) help clinicians answer questions posed to large collections of electronic medical records (EMRs), such as how best to identify a patient’s cancer stage
The results show that the medical-context vector space efficiently learned the preferred similar terms of reviewers and outperformed the baseline word2vec embedding in all three chart review tasks as measured with the area under the receiver operating characteristic curve (AUROC) metric
We evaluated the performance of the medical-context vector space in predicting the preferred similar terms of reviewers in three chart review tasks

Summary

Introduction

Information retrieval (IR) help clinicians answer questions posed to large collections of electronic medical records (EMRs), such as how best to identify a patient’s cancer stage. Chart reviews are relied upon to answer a wide range of questions—from determining the current stage of cancer for a particular patient to identifying which drugs appear to be most ordered for the treatment of seizures. These different chart review tasks can be assisted by query expansion methods; given the range of chart review tasks that derive from a single search term, a static set of similar terms is not appropriate for all tasks. The set of similar terms should dynamically adjust based on the task and context of the review

Objectives

Methods

Results

Discussion

Conclusion