Active learning reduces annotation time for clinical concept extraction

Mahnoosh Kholghi,Laurianne Sitbon,Guido Zuccon,Anthony Nguyen

doi:10.1016/j.ijmedinf.2017.08.001

Abstract

ObjectiveTo investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. Materials and methodsThere are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. ResultsWhen annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. DiscussionThe number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. ConclusionActive learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations.

Full Text