An Ontology-Based Semantic Similarity Measure Considering Multi-Inheritance in Biomedicine

Fengqin Yang,Hongguang Sun,Siya Chen,Tieli Sun,Yuanyuan Xing

doi:10.1155/2015/305369

Abstract

Computation of semantic similarity between words for text understanding is a vital issue in many applications such as word sense disambiguation, document categorization, and information retrieval. In recent years, different paradigms have been proposed to compute semantic similarity based on different ontologies and knowledge resources. In this paper, we propose a new similarity measure combining both superconcepts of the evaluated concepts and their common specificity feature. The common specificity feature considers the depth of the Least Common Subsumer (LCS) of two concepts and the depth of the ontology to obtain more semantic evidence. The multiple inheritance phenomenon in a large and complex taxonomy is taken into account by all superconcepts of the evaluated concepts. We evaluate and compare the correlation obtained by our measure with human scores against other existing measures exploiting SNOMED CT as the input ontology. The experimental evaluations show the applicability of the measure on different datasets and confirm the efficiency and simplicity of our proposed measure.

Highlights

In the last few years, the amount of available electronic information has increased sharply in many research areas such as biomedicine, education, psychology, linguistics, cognitive science, and artificial intelligence
There are no standard human rating datasets for semantic similarity like manually rated concept sets created by Rubenstein and Goodenough [32] and Miller and Charles [33]
Pedersen et al [11] stated that it is necessary to choose sets of words manually scored for the evaluation of concept semantic similarity measures in biomedicine

Summary

Introduction

In the last few years, the amount of available electronic information has increased sharply in many research areas such as biomedicine, education, psychology, linguistics, cognitive science, and artificial intelligence. It is an urgent issue to process the text information from a semantic perspective. Understood as the degree of taxonomical proximity, semantic similarity computes the likeness between words and plays a very important part in the above-mentioned fields such as word sense disambiguation [1], word spelling correction [2], automatic language translation [3], document categorization or clustering [4], information extraction and retrieval [5,6,7], detection of redundancy, and ontology learning [8, 9]. It is worth mentioning that many applications of semantic similarity computation are discussed in the biomedical domain due to the availability of numerous medical ontologies and resources that organize medical concepts into hierarchies. Semantic similarity between concepts of ontologies such as Gene [10, 11] was computed with the aim of assessing protein functional similarity [6]

Results

Discussion

Conclusion