Chapter 5 Federated Query Processing

Kemele M Endris,Maria-Esther Vidal,Damien Graux

doi:10.1007/978-3-030-53199-7_5

Kemele M Endris, Maria-Esther Vidal + Show 1 more

Open Access

https://doi.org/10.1007/978-3-030-53199-7_5

Copy DOI

Abstract

Big data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made. Despite the significant impact of big data and semantic web technologies, we are entering into a new era where domains like genomics are projected to grow very rapidly in the next decade. In this next era, integrating big data demands novel and scalable tools for enabling not only big data ingestion and curation but also efficient large-scale exploration and discovery. Federated query processing techniques provide a solution to scale up to large volumes of data distributed across multiple data sources. Federated query processing techniques resort to source descriptions to identify relevant data sources for a query, as well as to find efficient execution plans that minimize the total execution time of a query and maximize the completeness of the answers. This chapter summarizes the main characteristics of a federated query engine, reviews the current state of the field, and outlines the problems that still remain open and represent grand challenges for the area.

Highlights

Federated query processing techniques provide a solution to scale up to large volumes of data distributed across multiple data sources
RDF Molecule Templates (RDF-MTs) are merged based on their semantic descriptions defined by the ontology, e.g., in RDFS
SPLENDID provides a hybrid solution by combining Vocabulary of Interlinked Datasets (VoID) descriptions for data source selection along with SPARQL ASK queries submitted to each dataset at run-time for verification

Summary

Introduction

The number and variety of data collections have grown exponentially over recent decades and a similar growth rate is expected in the coming years. Data is usually ingested in myriad unstructured formats and may suffer reduced quality due to biases, ambiguities, and noise These issues impact on the complexity of the solutions for data integration. Techniques able to solve interoperability issues while addressing data complexity challenges imposed by big data characteristics are required [402]. Exemplary approaches include GEMMS [365], PolyWeb [244], BigDAWG [119], Ontario [125], and Constance [179] These systems collect metadata about the main characteristics of the heterogeneous data collections, e.g., formats and query capabilities. Rich descriptions of the properties and capabilities of the data have shown to be crucial for enabling these systems to effectively perform query processing.

Data Integration Systems

Classification of Data Integration Systems

Data Integration in the Era of Big Data

Federated Query Processing

Data Source Description

Query Decomposition and Source Selection

Query Planning and Optimization

Query Execution

Grand Challenges and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chapter 5 Federated Query Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 5	License type: CC BY 4.0

Similar Papers

Semantic Web technologies for the big data in life sciences.
Hongyan Wu ... Atsuko Yamaguchi
Bioscience trends | VOL. 8
Hongyan Wu, et. al.Hongyan Wu ... Atsuko Yamaguchi
01 Jan 2014
Bioscience trends | VOL. 8

Data Science in Healthcare: Implications for Early Career Investigators.
Sanjeev P Bhavnani ... Daniel Muñoz
Circulation Cardiovascular Quality and Outcomes | VOL. 9
Sanjeev P Bhavnani, et. al.Sanjeev P Bhavnani ... Daniel Muñoz
01 Nov 2016
Circulation Cardiovascular Quality and Outcomes | VOL. 9

Geographic information science in the era of geospatial big data: A cyberspace perspective
Xintao Liu ...
The Innovation | VOL. 3
Xintao Liu, et. al.Xintao Liu ...
06 Jul 2022
The Innovation | VOL. 3

Ontology Opportunities and Challenges: Discussions from Semantic Data Integration Perspectives
Abrar Omar Alkhamisi ... Mostafa Saleh
-
Abrar Omar Alkhamisi, et. al.Abrar Omar Alkhamisi ... Mostafa Saleh
01 Mar 2020
01 Mar 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chapter 5 Federated Query Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers