Abstract

A key property of linked data, i.e., the web-based representation and publication of data as interconnected labeled graphs, is that it enables querying and navigating through datasets distributed across the network. SPARQL1.1, the current standard query language for RDF-based linked data, defines a construct-called property paths (PP)-to navigate between the entities of a graph. This is potentially very useful in a number of use cases, e.g., in the biomedical domain, where large datasets are available as linked data graphs. However, the use of PP in SPARQL 1.1. is possible only on a single local graph, requiring us to merge all distributed datasets into one large, centrally stored graph, therefore reducing the value of using linked data in the first place. We propose an index-based approach-called QPPDs-for answering queries for paths distributed across multiple, distributed datasets. We provide a heuristic-based source selection mechanism to select the relevant datasets (also called data sources) for a given path query, and a technique that federates queries to selected sources, and assembles (merges) the paths (i.e., partial or complete) retrieved from those remote datasets. We demonstrate our approach on a genomics use-case, where the description of biological entities (e.g., genes, diseases, and drugs) is scattered across multiple datasets. In our preliminary investigation, we evaluate the QPPDs approach with real-world path queries-on biological data that are very heterogeneous in nature-in terms of performance (overall path retrieval time) and result completeness, i.e., the number of paths retrieved.

Highlights

  • The potential benefits of using Linked Data, have been increasingly considered in a variety of domains where rich, multi-source data need to be explored, e.g., bioinformatics, geography, literature, etc

  • MOTIVATING SCENARIO we present two motivating scenarios: (1) a real-world scenario showing the use of distributed property paths in RDF datasets for Cancer Genomics; and (2) a toy scenario which is used as a running example to explain the proposed approach

  • The motivation behind this work is the need of the BIOOPENER project, which aims at linking and discovery of linked data across cancer and biomedical data at publicly available distributed triple stores

Read more

Summary

INTRODUCTION

The potential benefits of using Linked Data ( known as the Web of Data or Semantic Web data), have been increasingly considered in a variety of domains where rich, multi-source data need to be explored, e.g., bioinformatics, geography, literature, etc. In the biomedical domain for example, a lot of data is available publicly from multiple, heterogeneous sources In such a case, it is very common for two biological entities (e.g., gene, protein, drug, pathway, etc.) to be related through paths formed of links going across several of those datasets. To find paths between two entities, the centralized approaches adopted by current systems pose some challenges such as: (i) querying multiple datasets requires the user to first merge them into a single graph, which is a cumbersome task; (ii) copied data need to be synchronized; and (iii) merged data might not be as up-to-date and fresh as in the original source; (iv) data is not always under control or fully accessible by the person querying it, and (v) scalability is a major issue in the centralized approaches.

MOTIVATING SCENARIO
PRELIMINARIES
RELATED WORK
THE QPPDS APPROACH
EVALUATION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.