Abstract

IntroductionHospitals generate large amounts of data and this data is generally modeled and labeled in a proprietary way, hampering its exchange and integration. Manually annotating data element names to internationally standardized data element identifiers is a time-consuming effort. Tools can support performing this task automatically. This study aimed to determine what factors influence the quality of automatic annotations. MethodsData element names were used from the Dutch COVID-19 ICU Data Warehouse containing data on intensive care patients with COVID-19 from 25 hospitals in the Netherlands. In this data warehouse, the data had been merged using a proprietary terminology system while also storing the original hospital labels (synonymous names). Usagi, an OHDSI annotation tool, was used to perform the annotation for the data. A gold standard was used to determine if Usagi made correct annotations. Logistic regression was used to determine if the number of characters, number of words, match score (Usagi's certainty) and hospital label origin influenced Usagi's performance to annotate correctly. ResultsUsagi automatically annotated 30.5% of the data element names correctly and 5.5% of the synonymous names. The match score is the best predictor for Usagi finding the correct annotation. It was determined that the AUC of data element names was 0.651 and 0.752 for the synonymous names respectively. The AUC for the individual hospital label origins varied between 0.460 to 0.905. DiscussionThe results show that Usagi performed better to annotate the data element names than the synonymous names. The hospital origin in the synonymous names dataset was associated with the amount of correctly annotated concepts. Hospitals that performed better had shorter synonymous names and fewer words. Using shorter data element names or synonymous names should be considered to optimize the automatic annotating process. Overall, the performance of Usagi is too poor to completely rely on for automatic annotation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call