Abstract

A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.

Highlights

  • Controlled terminologies and ontologies are indispensable for modern biomedicine [1]

  • Analysis of biomedical ontologies To demonstrate our approach to the comparison of biomedical ontologies, we identified concepts associated with disease phenotypes and relations in four medical ontologies: ICD9-CM [48,54], Clinical Problem Statement System (CCPSS) [55], SNOMED CT [56] and Medical Subject Headings (MeSH)

  • Comparing each medical ontology concept-by-concept, we found that despite a reasonable overlap in biomedical terms and concepts, different ontologies intersect little in their relations

Read more

Summary

Introduction

Controlled terminologies and ontologies are indispensable for modern biomedicine [1]. In the early 1970’s, explicit approaches to knowledge representation emerged in artificial intelligence [3], and in the 1990’s were christened ontologies in computer science [4]. These representations were promoted as stable schemas for data—a kind of object-oriented content—to facilitate data sharing and reuse. Biomedical scientists use ontologies to encode the results of complex experiments and observations consistently, and analysts use the resulting data to integrate and model system properties. Ontologies facilitate data storage, sharing between scientists and subfields, integrative analysis, and computational reasoning across many more facts than scientists can consider with traditional means

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call