Abstract

BackgroundWith the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult.MethodsTo address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint.ConclusionsUsing the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.

Highlights

  • With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored

  • Several other standards were published as new technologies evolved and the Minimum Information for Biological and Biomedical Investigations guideline was proposed for reporting all types of biomedical experiments [6]

  • Results based on approximately half of the datasets (86) have been published in scientific journals, and these datasets are available to the public

Read more

Summary

Introduction

With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. The annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. The first standard to be published provided requirements for the Minimum Information About a Microarray Experiment (MIAME) [5]. The major public repositories of genomics experiments, Gene Expression Omnibus (GEO) [7] and ArrayExpress [8], are compliant with these standards

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call