Abstract

To support "Database as a service" (DaaS) in the cloud, the database system is expected to provide similar functionalities as in centralized DBMS such as efficient processing of ad hoc queries. The system must therefore support DBMS-like indexes, possibly a few indexes for each table to provide fast location of data distributed over the network. In such a distributed environment, the indexes have to be distributed over the network to achieve scalability and reliability. Each cluster node maintains a subset of the index data. As in conventional DBMS, indexes incur maintenance overhead and the problem is more complex in the distributed environment since the data are typically partitioned and distributed based on a subset of attributes. Further, the distribution of indexes is not straight forward, and there is therefore always the question of scalability, in terms of data volume, network size, and number of indexes. In this paper, we examine the problem of providing DBMS-like indexing mechanisms in cloud DaaS, and propose an extensible, but simple and efficient indexing framework that enables users to define their own indexes without knowing the structure of the underlying network. It is also designed to ensure the efficiency of hopping between cluster nodes during index traversal, and reduce the maintenance cost of indexes. We implement three common indexes, namely distributed hash indexes, distributed B + -tree-like indexes and distributed multi-dimensional indexes, to demonstrate the usability and effectiveness of the framework. We conduct experiments on Amazon EC2 and an in-house cluster to verify the efficiency and scalability of the framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call