Abstract

Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.

Highlights

  • Because of the rapid progress of biotechnology, various types of biological data, including genomic sequences, have been rapidly accumulating; their effective computational management appears to be a challenging issue in biological data analysis

  • General Resource Description Framework (RDF) model of ortholog information based on Ortholog Ontology

  • We developed a general RDF model for describing ortholog information

Read more

Summary

Introduction

Because of the rapid progress of biotechnology, various types of biological data, including genomic sequences, have been rapidly accumulating; their effective computational management appears to be a challenging issue in biological data analysis. To achieve the integration of such growing heterogeneous data, there is an urgent need for consolidating key information that links biologically related resources to each other. Among the various biological resources, ortholog information can play a central role in integrating the biological data of multiple species. Ortholog information is a useful resource to link the corresponding genes of different species and transfer the biological knowledge of model organisms to organisms with newly sequenced genomes. In this era where numerous novel genome sequences are being determined, the concept of such computational knowledge transfer is becoming increasingly valuable. Genomic data integration using ortholog information and comparative analysis based on it are powerful approaches for biological knowledge discovery

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call