Clinical notes contain valuable information for research and monitoring quality of care. Named Entity Recognition (NER) is the process for identifying relevant pieces of information such as diagnoses, treatments, side effects, etc., and bring them to a more structured form. Although recent advancements in deep learning have facilitated automated recognition, particularly in English, NER can still be challenging due to limited specialized training data. This exacerbated in hospital settings where annotations are costly to obtain without appropriate incentives and often dependent on local specificities. In this work, we study whether this annotation process can be effectively accelerated by combining two practical strategies. First, we convert usually passive annotation tasks into a proactive contest to motivate human annotators in performing a task often considered tedious and time-consuming. Second, we provide pre-annotations for the participants to evaluate how recall and precision of the pre-annotations can boost or deteriorate annotation performance. We applied both strategies to a text de-identification task on French clinical notes and discharge summaries at a large Swiss university hospital. Our results show that proactive contest and average quality pre-annotations can significantly speed up annotation time and increase annotation quality, enabling us to develop a text de-identification model for French clinical notes with high performance (F1 score 0.94).
Read full abstract