Abstract

BackgroundBiological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework.MethodsBy using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs).ResultsIND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements.ConclusionAmbiguity of biological database content is addressed by a method that identifies implicit knowledge behind semantic annotations in biological databases and grounds it in an expressive upper-level ontology. The result is a seamless representation of database structure, content and annotations as OWL models.

Highlights

  • Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval

  • Apart from the OWL profiles required, the result shows how individuals can be retrieved with IND, and classes in twostep queries for SUBC and Hybrid representation with subclasses and dispositions (HYBR)

  • In IND, there are more axioms than in SUBC, DISP and HYBR due to the large amount of relationships created among the individuals while an OWL model following the IND strategy may not include any class definitions

Read more

Summary

Introduction

Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. Database records from the Unified Protein Resource (UniProt) [1] are annotated with. As much as these domain ontologies, in isolation, obey formal principles and good practice guidelines [4, 5], as little the meaning of the annotations themselves has been formalized so far. UniProt Core includes the description on database fields related to each other, but without formalization and links to GO (for example). This can constitute a source of misunderstanding and hamper correct data interpretation, leading to doubtful or wrong conclusions

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.