Abstract

Currently, radiation oncology-specific electronic medical records (EMRs) allow providers to input the radiation treatment site using free text. The purpose of this study is to develop a natural language processing (NLP) tool to extract encoded data from radiation treatment sites in an EMR.Treatment sites were extracted from all patients who completed treatment in our department from April 1, 2011, to April 30, 2013. A system was designed to extract the Unified Medical Language System (UMLS) concept codes using a sample of 11,018 unique site names from 31118 radiation therapy (RT) sites. Among those, 5500 unique site name strings that constitute approximately half of the sample were spared as a test set to evaluate the final system. A dictionary and calculated n-gram statistics using UMLS concepts from related semantic types were combined with manually encoded data.There was an average of 2.2 sites per patient. Prior to extraction, the 20 most common unique treatment sites were used 4215 times (38.3%). The most common treatment site was whole brain RT, which was entered using 27 distinct terms for a total of 1063 times. The customized NLP solution displayed great gains as compared to other systems, with a recall of 0.99 and a precision of 0.99.A customized NLP tool was extracting encoded data from radiation treatment sites in an EMR with great accuracy. This can be integrated into a repository of demographic, genomic, treatment, and outcome data to advance personalized oncologic care.

Highlights

  • A National Radiation Oncology Registry (NROR) has been created through a collaboration between the Radiation Oncology Institute (ROI) and the American Society for Radiation Oncology (ASTRO) to develop a national database that will be used to study the clinical outcomes and patterns of care. Inherent in this effort is the need for automated tools to extract clinical information from radiation oncology electronic medical records (EMRs)

  • Previous studies have demonstrated the feasibility of extracting meaningful clinical data using natural language processing (NLP) tools from diagnoses [1], problem lists [2], pathology reports [3,4], and radiology reports [5,6,7,8]

  • The purpose of this study is to develop an NLP tool to extract encoded data from radiation treatment sites in an EMR

Read more

Summary

Introduction

Previous studies have demonstrated the feasibility of extracting meaningful clinical data using natural language processing (NLP) tools from diagnoses [1], problem lists [2], pathology reports [3,4], and radiology reports [5,6,7,8] These tools are not designed to handle the complexities of radiation therapy (RT) site names, which include many abbreviations specific to our field. For this analysis, information was obtained from the RT delivery (record and verify) electronic medical record (MOSAIQ®, Elekta Care Management, Stockholm), which allows manual, free-text input of the desired treatment site. PCI: percutaneous coronary intervention; WBRT: whole-brain radiation therapy; WBXRT: whole-brain radiotherapy

Discussion
Disclosures
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call