Abstract

In the life sciences, researchers increasingly want to access multiple databases in an integrated way. However, different databases currently use different formats and vocabularies, hindering the proper integration of heterogeneous life science data. Adopting the Resource Description Framework (RDF) has the potential to address such issues by improving database interoperability, leading to advances in automatic data processing. Based on this idea, we have advised many Japanese database development groups to expose their databases in RDF. To further promote such activities, we have developed an RDF-based life science dataset repository called the National Bioscience Database Center (NBDC) RDF portal. All the datasets in this repository have been reviewed by the NBDC to ensure interoperability and queryability. As of July 2018, the service includes 21 RDF datasets, comprising over 45.5 billion triples. It provides SPARQL endpoints for all datasets, useful metadata and the ability to download RDF files. The NBDC RDF portal can be accessed at https://integbio.jp/rdf/.

Highlights

  • In the life sciences, enormous amounts of diverse data are continually being produced and numerous databases have been made available on the Internet [1]

  • The same resource may be referenced from different Uniform Resource Identifiers (URIs), which is one of the reasons that interfere with Resource Description Framework (RDF) dataset interoperability

  • From left to right are the RefEx ID, expression value of the probe 210049 at in RefEx, URI of the compound exposed to the sample of Open TG-GATEs and expression value of the probe 210049 at in Open TG-GATEs we provide some examples of SPARQL queries that query multiple datasets in the documents section of the RDF portal

Read more

Summary

Introduction

Enormous amounts of diverse data are continually being produced and numerous databases have been made available on the Internet [1]. Life science data are currently being provided in a wide variety of formats, such as flat files and dump files from relational database management systems (RDBMSs) as well as in JavaScript Object Notation, Extensible Markup Language and comma-separated values (CSV) formats. It is often extremely time-consuming for users to extract the necessary data from these diverse sources and construct a dataset for use in their research. This article describes our new RDF repository service, the NBDC RDF portal, in detail

Background to creating the guidelines
Results
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call