Analysing Syntactic Regularities and Irregularities in SNOMED-CT

Eleni Mikroyannidi,Robert Stevens,Luigi Iannone,Alan Rector

doi:10.1186/2041-1480-3-8

Eleni Mikroyannidi, Robert Stevens + Show 2 more

Open Access

https://doi.org/10.1186/2041-1480-3-8

Copy DOI

Journal: Journal of biomedical semantics	Publication Date: Jan 1, 2012
Citations: 25	License type: cc-by

Affiliation: University of Manchester

Abstract

MotivationIn this paper we demonstrate the usage of RIO; a framework for detecting syntactic regularities using cluster analysis of the entities in the signature of an ontology. Quality assurance in ontologies is vital for their use in real applications, as well as a complex and difficult task. It is also important to have such methods and tools when the ontology lacks documentation and the user cannot consult the ontology developers to understand its construction. One aspect of quality assurance is checking how well an ontology complies with established ‘coding standards’; is the ontology regular in how descriptions of different types of entities are axiomatised? Is there a similar way to describe them and are there any corner cases that are not covered by a pattern? Detection of regularities and irregularities in axiom patterns should provide ontology authors and quality inspectors with a level of abstraction such that compliance to coding standards can be automated. However, there is a lack of such reverse ontology engineering methods and tools.ResultsRIO framework allows regularities to be detected in an OWL ontology, i.e. repetitive structures in the axioms of an ontology. We describe the use of standard machine learning approaches to make clusters of similar entities and generalise over their axioms to find regularities. This abstraction allows matches to, and deviations from, an ontology’s patterns to be shown. We demonstrate its usage with the inspection of three modules from SNOMED-CT, a large medical terminology, that cover “Present” and “Absent” findings, as well as “Chronic” and “Acute” findings. The module sizes are 5 065, 20 688 and 19 812 asserted axioms. They are analysed in terms of their types and number of regularities and irregularities in the asserted axioms of the ontology. The analysis showed that some modules of the terminology, which were expected to instantiate a pattern described in the SNOMED-CT technical guide, were found to have a high number of regularity deviations. A subset of these were categorised as “design defects” by verifying them with past work on the quality assurance of SNOMED-CT. These were mainly incomplete descriptions. In the worst case, the expected patterns described in the technical guide were followed by only 5% of the axioms in the module.ConclusionIt is possible to automatically detect regularities and then inspect irregularities in an ontology. We argue that RIO is a tool to find and report such matches and mismatches, for evaluations by the domain experts. We have demonstrated that standard clustering techniques from machine learning can offer a tool in the drive for quality assurance in ontologies.Availabilityhttp://riotool.sourceforge.net/Contacthttp://eleni.mikroyannidi@manchester.ac.uk, http://robert.stevens@manchehster.ac.uk

Highlights

Ontologies provide an effective way for creating, using and sharing medical and biological vocabularies [1]
We demonstrate its usage with the inspection of three modules from SNOMED-CT, a large medical terminology, that cover “Present” and “Absent” findings, as well as “Chronic” and “Acute” findings
The analysis showed that some modules of the terminology, which were expected to instantiate a pattern described in the SNOMED-CT technical guide, were found to have a high number of regularity deviations

Summary

Introduction

Ontologies provide an effective way for creating, using and sharing medical and biological vocabularies [1]. Ontology construction can be based upon patterns of different abstraction level; these can be notes from the developers, general guidelines, formal documentation of the ontology, published papers describing the ontology, spreadsheets of fillers for ontology templates etc [4,5,6]. Such documentation or coding standards should offer an opportunity for quality assurance, if deviations from those guidelines can be found. The question is how does an ontology author effectively and efficiently find, those classes that conform to the pattern, but those classes that do not conform?

Objectives

Methods

Results

Conclusion