Abstract

The implementation of electronic medical records (EMRs) produces a huge amount of unstructured clinical text. This domain-specific clinical text has opened a stage for temporal information extraction (TIE) due to its significance of exploitation in medical care and richness of temporality. Processing temporal information in clinical text is much more difficult in comparison to newswire text due to implicit expression of temporal information, domain-specific nature, lack of structure and writing quality. Despite of these limitations, the existing works established various methods to extract temporal information with the help of annotated corpora. But it is costly and time consuming to prepare the annotated corpora and thus their small size inevitably affect the processing quality. Motivated by this fact, in this work we propose a novel two-stage semi-supervised framework to exploit the abundant unannotated clinical text to automatically detect the temporal information and gradually increase the size of annotated corpora and then subsequently improve the temporal information extraction accuracy. In our pilot study of the proposed framework stage-one, we developed a conditional random fields (CRFs) model for the temporal event and expression extractions on the annotated data with the various features sets at phrase level. At first, we generated the possible features from the annotated corpora and significant features are selected. Finally we trained and evaluated our model with the selected features. Our model achieved F-measure of 81.34% for event recognition, and F-measure of 79.95% for temporal expression extraction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call