Abstract
BackgroundSemantic similarity estimation significantly promotes the understanding of natural language resources and supports medical decision making. Previous studies have investigated semantic similarity and relatedness estimation between biomedical terms through resources in English, such as SNOMED-CT or UMLS. However, very limited studies focused on the Chinese language, and technology on natural language processing and text mining of medical documents in China is urgently needed. Due to the lack of a complete and publicly available biomedical ontology in China, we only have access to several modest-sized ontologies with no overlaps. Although all these ontologies do not constitute a complete coverage of biomedicine, their coverage of their respective domains is acceptable. In this paper, semantic similarity estimations between Chinese biomedical terms using these multiple non-overlapping ontologies were explored as an initial study. MethodsTypical path-based and information content (IC)-based similarity measures were applied on these ontologies. From the analysis of the computed similarity scores, heterogeneity in the statistical distributions of scores derived from multiple ontologies was discovered. This heterogeneity hampers the comparability of scores and the overall accuracy of similarity estimation. This problem was addressed through a novel language-independent method by combining semantic similarity estimation and score normalization. A reference standard was also created in this study. ResultsCompared with the existing task-independent normalization methods, the newly developed method exhibited superior performance on most IC-based similarity measures. The accuracy of semantic similarity estimation was enhanced through score normalization. This enhancement resulted from the mitigation of heterogeneity in the similarity scores derived from multiple ontologies. ConclusionWe demonstrated the potential necessity of score normalization when estimating semantic similarity using ontology-based measures. The results of this study can also be extended to other language systems to implement semantic similarity estimation in biomedicine.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have