Methods for Building Sense Inventories of Abbreviations in Clinical Notes

Hua Xu,P D Stetson,Carol Friedman

doi:10.1197/jamia.m2927

Abstract

ObjectiveTo develop methods for building corpus-specific sense inventories of abbreviations occurring in clinical documents. DesignA corpus of internal medicine admission notes was collected and instances of each clinical abbreviation in the corpus were clustered to different sense clusters. One instance from each cluster was manually annotated to generate a final list of senses. Two clustering-based methods (Expectation Maximization—EM and Farthest First—FF) and one random sampling method for sense detection were evaluated using a set of 12 clinical abbreviations. MeasurementsThe clustering-based sense detection methods were evaluated using a set of clinical abbreviations that were manually sense annotated. “Sense Completeness” and “Annotation Cost” were used to measure the performance of different methods. Clustering error rates were also reported for different clustering algorithms. ResultsA clustering-based semi-automated method was developed to build corpus-specific sense inventories for abbreviations in hospital admission notes. Evaluation demonstrated that this method could largely reduce manual annotation cost and increase the completeness of sense inventories when compared with a manual annotation method using random samples. ConclusionThe authors developed an effective clustering-based method for building corpus-specific sense inventories for abbreviations in a clinical corpus. To the best of the authors knowledge, this is the first time clustering technologies have been used to help building sense inventories of abbreviations in clinical text. The results demonstrated that the clustering-based method performed better than the manual annotation method using random samples for the task of building sense inventories of clinical abbreviations.

Full Text