Abstract

The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.

Highlights

  • The Gene Ontology (GO) is the de facto standard for describing characteristics of gene products [1]

  • We introduce an approach for mining interesting multi-level association rules across the three acyclic graphs used to represent the sub-ontologies of the GO: Cellular Component (CC), Molecular Function (MF) and Biological Process (BP)

  • We show that the three sub-ontologies of the GO exhibit different distributions of terms across levels of abstraction in the structure of the GO and in annotations assigned to datasets

Read more

Summary

Introduction

The Gene Ontology (GO) is the de facto standard for describing characteristics of gene products [1]. They achieve generalization by replacing each GO annotation with all of the GO terms on all of the paths from the term to the root of the ontology This approach has two major shortcomings: 1) it will discover parent child relationships among terms that are already known, and 2) many of the rules will involve very high level GO terms with little information. Px and Ox are calculated using information from the annotation dataset and the ontology structure They use this approach to generate automatic slim sets from the GO, but it is unclear how this approach will work for mining associations from multiple ontologies. With more bio-ontologies being developed to describe different types of biological data and the increasing interest in using multiple ontologies to capture complex biological data, the ability to extract implicit relationships between different ontologies is becoming more important for biologists and tool developers who wish to utilize these ontologies and the data in them [22]

Materials and Methods
Results And Discussion
Evaluation Category
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call