Clustering-based fragmentation and data replication for flexible query answering in distributed databases

Lena Wiese

doi:10.1186/s13677-014-0018-0

Abstract

One feature of cloud storage systems is data fragmentation (or sharding) so that data can be distributed over multiple servers and subqueries can be run in parallel on the fragments. On the other hand, flexible query answering can enable a database system to find related information for a user whose original query cannot be answered exactly. Query generalization is a way to implement flexible query answering on the syntax level. In this paper we study a clustering-based fragmentation for the generalization operator Anti-Instantiation with which related information can be found in distributed data. We use a standard clustering algorithm to derive a semantic fragmentation of data in the database. The database system uses the derived fragments to support an intelligent flexible query answering mechanism that avoids overgeneralization but supports data replication in a distributed database system. We show that the data replication problem can be expressed as a special Bin Packing Problem and can hence be solved by an off-the shelf solver for integer linear programs. We present a prototype system that makes use of a medical taxonomy to determine similarities between medical expressions.

Highlights

In the era of “big data” huge data sets usually cannot be stored on a single server any longer
Clustering-based fragmentation We present our intelligent fragmentation and replication procedure that will support flexible query answering with anti-instantiation
Some approaches have used taxonomies or ontologies for flexible query answering but did not consider their application for distributed storage of data: CoBase [26] used a type abstraction hierarchy to generalize values; Shin et al [27] use some specific notion of metric distance in a knowledge abstraction hierarchy to identify semantically related answers; Halder and Cortesi [28] define a partial order between cooperative answers based on their abstract interpretation framework; Muslea [29] discusses the relaxation of queries in disjunctive normal form

Summary

Introduction

In the era of “big data” huge data sets usually cannot be stored on a single server any longer. In a cloud storage system, a distributed database management system (DDBMS) can be used to manage the data in a network of servers. Conventional database systems usually return an empty answer to a failing query In most cases, this is an undesirable situation for the user, because he has to revise his query and send the revised query to the database system in order to get some information from the database. We present a detailed query rewriting and query redirecting method that allows access to the distributed fragments. This was discussed in [2] only briefly. Section Clustering-based fragmentation presents the main contribution on clustering-based fragmentation and its management with a lookup table; whereas Section Query rewriting talks about how to decompose a query to be distributed among the servers. Section Related work surveys related work and Section Discussion and conclusion concludes the paper

Background

2: Choose one literal Lj where t occurs 3

Fracture brokenArm 2 NULL

Discussion and conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cloud Computing	Publication Date: Oct 28, 2014
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Clustering-based fragmentation and data replication for flexible query answering in distributed databases

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cloud Computing

Lead the way for us

Similar Papers

Taxonomy-Based Fragmentation for Anti-instantiation in Distributed Databases
Lena Wiese
-
Lena WieseLena Wiese
01 Dec 2013
01 Dec 2013

Horizontal Fragmentation and Replication for Multiple Relaxation Attributes
Lena Wiese
-
Lena WieseLena Wiese
01 Jan 2015
01 Jan 2015

A Replication Scheme for Multiple Fragmentations with Overlapping Fragments
Lena Wiese ... Ferdinand Bollwein
The Computer Journal | VOL. 60
Lena Wiese, et. al.Lena Wiese ... Ferdinand Bollwein
06 Aug 2016
The Computer Journal | VOL. 60

A Novel Vertical Fragmentation, Replication and Allocation Model in DDBSs
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
10 Jan 2014
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering-based fragmentation and data replication for flexible query answering in distributed databases

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cloud Computing