G-Links: a gene-centric link acquisition service

Kazuki Oshita,Masaru Tomita,Kazuharu Arakawa

doi:10.12688/f1000research.5754.1

Abstract

With the availability of numerous curated databases, researchers are now able to efficiently use the multitude of biological data by integrating these resources via hyperlinks and cross-references. A large proportion of bioinformatics research tasks, however, may include labor-intensive tasks such as fetching, parsing, and merging datasets and functional annotations from distributed multi-domain databases. This data integration issue is one of the key challenges in bioinformatics. We aim to solve this problem with a service named G-Links, 1) by gathering resource URI information from 130 databases and 30 web services in a gene-centric manner so that users can retrieve all available links about a given gene, 2) by providing RESTful API for easy retrieval of links including facet searching based on keywords and/or predicate types, and 3) by producing a variety of outputs as visual HTML page, tab-delimited text, and in Semantic Web formats such as Notation3 and RDF. G-Links as well as other relevant documentation are available at http://link.g-language.org/

Highlights

The use of large-scale data or multi-domain information is becoming a prerequisite in all fields of molecular biology, in light of the advent of high-throughput measurement technologies exemplified by the new generation DNA sequencers, and further driven by the conceptual progress in integrative systems biology approaches
Here we describe a new RESTful service named G-Links, which gathers Uniform Resource Identifiers (URI) from more than 100 databases in a gene-centric manner, and provide querying interface based on gene sets for hundreds of species
G-Links is available at http://link.g-language.org/ as a RESTful web service, which is suited for resource-centric access and highly accessible via HyperText Transfer Protocol (HTTP)

Summary

Introduction

The use of large-scale data or multi-domain information is becoming a prerequisite in all fields of molecular biology, in light of the advent of high-throughput measurement technologies exemplified by the new generation DNA sequencers, and further driven by the conceptual progress in integrative systems biology approaches. Bioinformatics researchers need to collect and integrate data from a variety of sources, each with diverse syntax, semantics, protocols, identifiers and naming conventions (Bhagat et al, 2010; Brazas et al, 2012; Katayama et al, 2010). This data integration issue is one of the key challenges in the field of bioinformatics (Stein, 2002; Stein, 2008). At the current state of Semantic Web technologies, cross-domain queries require extensive reasoning or manual curation of ontologies, and the cross-reference-based approach still has an advantage in terms of user experience with lower latency

Objectives

Results

Conclusion