Abstract

In the last few years, the Life Sciences domain has experienced a rapid growth in the amount of available biological databases. The heterogeneity of these databases makes data integration a challenging issue. Some integration challenges are locating resources, relationships, data formats, synonyms or ambiguity. The Linked Data approach partially solves the heterogeneity problems by introducing a uniform data representation model. Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. This article introduces kpath, a database that integrates information related to metabolic pathways. kpath also provides a navigational interface that enables not only the browsing, but also the deep use of the integrated data to build metabolic networks based on existing disperse knowledge. This user interface has been used to showcase relationships that can be inferred from the information available in several public databases.Database URL: The public Linked Data repository can be queried at http://sparql.kpath.khaos.uma.es using the graph URI “www.khaos.uma.es/metabolic-pathways-app”. The GUI providing navigational access to kpath database is available at http://browser.kpath.khaos.uma.es.

Highlights

  • Over the last two decades, the biological database community has witnessed a rapid growth in the number of available data sources

  • BioCyc is a database which attempts to collect genome/ pathway databases (PGDBs) from eukaryotic and prokaryotic species whose sequencing process is complete [6]. This collection includes 5500 genome/pathway databases that are in the process of literature-based curation (5493 out of 5500) and those that have been involved in the curation process for at least a year (7 out of 5500) such as Metacyc [6], HumanCyc [7], PlantCyc [8], AraCyc [9], LeishCyc [6], TrypanoCyc [10] and YeastCyc [11]

  • We present an approach to integrate pathway data from four different Linked Data repositories. kpath takes the Bio2Rdf Kegg’s data as the core, which is extended with organism data from NCBI Taxonomy [35] and Protein data from SwissProt [36], as well as with related pathway data extracted from Bio2RDF Reactome distribution [3]

Read more

Summary

Introduction

Over the last two decades, the biological database community has witnessed a rapid growth in the number of available data sources. The AraCyc is another plant-related database which stores pathway information from literature, about Arabidopsis displaying the data on a user-friendly interface. If a user wants to see all the pathways from Kegg and Reactome, which involve a specific metabolite, they would have to run two independent searches for each of the databases.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call