A novel view on information content of concepts in a large ontology and a view on the structure and the quality of the ontology

Carl Van Buggenhout,Werner Ceusters

doi:10.1016/j.ijmedinf.2004.03.009

Abstract

Semantic distance and semantic similarity are two important information retrieval measures used in word sense disambiguation as well as for the assessment of how relevant concepts are with respect to the documents in which they are found. A variety of calculation methods have been proposed in the literature, whereby methods taking into account the information content of an individual concept outperform those that do not. In this paper, we present a novel recursive approach to calculate a concept's information content based on the information content of the concepts to which it relates. The method is applicable to extremely large ontologies containing several million concepts and relationships amongst them. It is shown that a concept's information content as calculated by this method provides additional information with respect to an ontology that cannot be approximated by hierarchical edge-counting or human insight. In addition, it is suggested that the method can be used for quality control within large ontologies and that it can give you an impression on the structure and the quality of the ontology.

Full Text