Natural language processing pipeline for temporal information extraction and classification from free text eligibility criteria

Gayathri Parthasarathy,Paul Anderson,Aspen Olmsted

doi:10.1109/i-society.2016.7854192

Abstract

Automation of information extraction from eligibility criteria will provide a breakthrough in effective utilization of information for patient search in clinical databases. A majority of eligibility criteria contain temporal information associated with medical conditions and events. This project creates a novel natural language processing (NLP) pipeline for extraction and classification of temporal information as historic, current and planned from free-text eligibility criteria. The pipeline uses pattern learning algorithms for extracting temporal information and trained Random Forest classifier for classification. The pipeline achieved an accuracy of 0.82 in temporal data detection and classification with an average precision of 0.83 and recall of 0.80 in temporal data classification.

Full Text