On K-means clustering-based approach for DDBSs design

Ali A. Amer

doi:10.1186/s40537-020-00306-9

Abstract

In Distributed Database Systems (DDBS), communication costs and response time have long been open-ended challenges. Nevertheless, when DDBS is carefully designed, the desired reduction in communication costs will be achieved. Data fragmentation (data clustering) and data allocation are on popularity as the prime strategies in constant use to design DDBS. Based on these strategies, on the other hand, several design techniques have been presented in the literature to improve DDBS performance using either empirical results or data statistics, making most of them imperfect or invalid particularly, at least, at the initial stage of DDBSs design. In this paper, thus, a heuristic k-means approach for vertical fragmentation and allocation is introduced. This approach is primarily focused on DDBS design at the initial stage. Many techniques are being joined in a step to make a promising work. A brief yet effective experimental study, on both artificially-created and real datasets, has been conducted to demonstrate the optimality of the proposed approach, comparing with its counterparts, as the obtained results has been shown encouraging.

Highlights

IntroductionA significant progress has been made in Distributed Database Systems (DDBS) design
During the last years, a significant progress has been made in Distributed Database Systems (DDBS) design
It is worth indicating that all requirements like queries and the query frequencies are hypothesized to be collected from the workload of DDBS

Summary

Introduction

A significant progress has been made in DDBS design. This progress has been concentrated on fragmentation and allocation techniques due to their critical impact on DDBS productivity, in relational databases. Data allocation seeks to promote DDBS performance by placing the properlybroken fragments into their relative sites in which they are most needed. When data fragmentation and allocation are well performed, DDBS throughput is substantially optimized. This optimization is often met by promoting performance through minimizing the irrelevant access for data (i.e. transmission minimization), which is already stored in different sites, as distributed query under processing. Paper’s contributions are summarized as follows: Amer J Big Data (2020) 7:31

Objectives

Methods

Results

Discussion

Conclusion