Japanese medical device adverse events terminology, published by the Japan Federation of Medical Devices Associations (JFMDA terminology), contains entries for 89 terminology items, with each of the terminology entries created independently. It is necessary to establish and verify the consistency of these terminology entries and map them efficiently and accurately. Therefore, developing an automatic synonym detection tool is an important concern. Such tools for edit distances and distributed representations have achieved good performance in previous studies. The purpose of this study was to identify synonyms in JFMDA terminology and evaluate the accuracy using these algorithms. A total of 125 definition sentence pairs were created from the terminology as baselines. Edit distances (Levenshtein and Jaro–Winkler distance) and distributed representations (Word2vec, fastText, and Doc2vec) were employed for calculating similarities. Receiver operating characteristic analysis was carried out to evaluate the accuracy of synonym detection. A comparison of the accuracies of the algorithms showed that the Jaro–Winkler distance had the highest sensitivity, Doc2vec with DM had the highest specificity, and the Levenshtein distance had the highest value in area under the curve. Edit distances and Doc2vec makes it possible to obtain high accuracy in predicting synonyms in JFMDA terminology.
Read full abstract