Abstract

One feature of cloud storage systems is data fragmentation (or sharding) so that data can be distributed over multiple servers and subqueries can be run in parallel on the fragments. On the other hand, flexible query answering can enable a database system to find related information for a user whose original query cannot be answered exactly. Query generalization is a way to implement flexible query answering on the syntax level. In this paper we study a clustering-based fragmentation for the generalization operator Anti-Instantiation with which related information can be found in distributed data. We use a standard clustering algorithm to derive a semantic fragmentation of data in the database. The database system uses the derived fragments to support an intelligent flexible query answering mechanism that avoids overgeneralization but supports data replication in a distributed database system. We show that the data replication problem can be expressed as a special Bin Packing Problem and can hence be solved by an off-the shelf solver for integer linear programs. We present a prototype system that makes use of a medical taxonomy to determine similarities between medical expressions.

Highlights

  • In the era of “big data” huge data sets usually cannot be stored on a single server any longer

  • Clustering-based fragmentation We present our intelligent fragmentation and replication procedure that will support flexible query answering with anti-instantiation

  • Some approaches have used taxonomies or ontologies for flexible query answering but did not consider their application for distributed storage of data: CoBase [26] used a type abstraction hierarchy to generalize values; Shin et al [27] use some specific notion of metric distance in a knowledge abstraction hierarchy to identify semantically related answers; Halder and Cortesi [28] define a partial order between cooperative answers based on their abstract interpretation framework; Muslea [29] discusses the relaxation of queries in disjunctive normal form

Read more

Summary

Introduction

In the era of “big data” huge data sets usually cannot be stored on a single server any longer. In a cloud storage system, a distributed database management system (DDBMS) can be used to manage the data in a network of servers. Conventional database systems usually return an empty answer to a failing query In most cases, this is an undesirable situation for the user, because he has to revise his query and send the revised query to the database system in order to get some information from the database. We present a detailed query rewriting and query redirecting method that allows access to the distributed fragments. This was discussed in [2] only briefly. Section Clustering-based fragmentation presents the main contribution on clustering-based fragmentation and its management with a lookup table; whereas Section Query rewriting talks about how to decompose a query to be distributed among the servers. Section Related work surveys related work and Section Discussion and conclusion concludes the paper

Background
2: Choose one literal Lj where t occurs 3
Fracture brokenArm 2 NULL
Discussion and conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call