Abstract

Electronic Medical Record (EMR) systems store patients' medical information in either structured or unstructured, free-text format such as clinical reports. Pathology notes are a type of clinical reports that may store cancer related information such as diagnoses and description of tissue sample. Data in clinical documents can provide up to 20% of knowledge in addition to structured data stored in discrete fields. The process of extracting information from documents can be time-consuming and non-trivial. We evaluated several natural language processing (NLP) open source tools to extract terms of interest from pathology documents and to incorporate with data already stored in the institutional data warehouse (EDW). Many of the evaluated NLP software tools provide various features, but none suites our immediate need of extracting key pathology terms. This paper discusses our in-house developed framework to identify and extract pathology data points from pathology documents, curate, and load in the EDW. The performance of the proposed model was evaluated and extracted terms were validated with data stored in the institutional electronic medical record system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call