Abstract

Biological data, and particularly annotation data, are increasingly being represented in directed acyclic graphs (DAGs). However, while relevant biological information is implicit in the links between multiple domains, annotations from these different domains are usually represented in distinct, unconnected DAGs, making links between the domains represented difficult to determine. We develop a novel family of general statistical tests for the discovery of strong associations between two directed acyclic graphs. Our method takes the topology of the input graphs and the specificity and relevance of associations between nodes into consideration. We apply our method to the extraction of associations between biomedical ontologies in an extensive use-case. Through a manual and an automatic evaluation, we show that our tests discover biologically relevant relations. The suite of statistical tests we develop for this purpose is implemented and freely available for download.

Highlights

  • An increasing number of discoveries, in biomedicine, are facilitated by statistical analyses of data annotated to biomedical ontologies [1]

  • We evaluated our statistical method through an extensive usecase in which we applied our tests to the detection of strong semantic associations between the Gene Ontology [3] and the Celltype Ontology [6] based on co-occurrence in scientific literature

  • Ontologies as graphs While the tests we develop can be applied to any directed acyclic graphs (DAGs) that satisfies the conditions specified above, their primary application is to test the significance of an association between categories from two ontologies

Read more

Summary

Introduction

An increasing number of discoveries, in biomedicine, are facilitated by statistical analyses of data annotated to biomedical ontologies [1]. We evaluated our statistical method through an extensive usecase in which we applied our tests to the detection of strong semantic associations between the Gene Ontology [3] and the Celltype Ontology [6] based on co-occurrence in scientific literature. In this use-case, we annotated the ontologies with occurrence and cooccurrence count data of the ontologies category labels in full text scientific articles. Hybrid similarity measures that combine nodeand edge-based approaches have been developed Most of these approaches utilize the information content. When applying out method to semantic similarity between ontologies, we can compute initial semantic similarity values for categories which do not belong to the same ontologies

Methods
Results
Discussion
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.