An Efficient Distributed Data Management Method based key Columns Partition Preprocessing

Xu Tao,Li Baolu,Zhang Wei

doi:10.14257/ijdta.2015.8.3.17

Abstract

With the development of mobile internet and social network, the scale of structured data have been increasing to PB level and above rapidly, while the query performance is greatly reduce. The efficiency of query optimization on large-scale datasets is currently a research focus in both academia and industry. In this paper, we present a distributed data management method, designed to improve query performance, called KCSQ. KCSQ analyses historical SQL commands, deduces statistics using frequency and the coupling degree of tables and table columns, and confirms the key column based on statistical evidence. When importing new tables into the HDFS, the data are divided into different blocks according to their key column. Any query on these columns can reduce the amount of data to be queried and the number of working nodes and thus effectively improves the throughput rate of the system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Efficient Distributed Data Management Method based key Columns Partition Preprocessing

Abstract

Talk to us

Similar Papers

More From: International Journal of Database Theory and Application

Lead the way for us

Journal: International Journal of Database Theory and Application	Publication Date: Jun 30, 2015
Citations: 10

Similar Papers

Distributed Matrix Multiplication Based on Frame Quantization for Straggler Mitigation
Kyungrak Son ... Wan Choi
IEEE Transactions on Signal Processing | VOL. 70
Kyungrak Son, et. al.Kyungrak Son ... Wan Choi
01 Jan 2021
IEEE Transactions on Signal Processing | VOL. 70

Large-Scale Merging of Histograms using Distributed In-Memory Computing
Jakob Blomer ... Gerardo Ganis
Journal of Physics: Conference Series | VOL. 664
Jakob Blomer, et. al.Jakob Blomer ... Gerardo Ganis
01 Dec 2015
Journal of Physics: Conference Series | VOL. 664

Dimensions based data clustering and zone maps
Mohamed Ziauddin ... You Jung Kim
Proceedings of the VLDB Endowment | VOL. 10
Mohamed Ziauddin, et. al.Mohamed Ziauddin ... You Jung Kim
01 Aug 2017
Proceedings of the VLDB Endowment | VOL. 10

A Low-Complexity and Adaptive Distributed Source Coding Design for Model Aggregation in Distributed Learning
Naifu Zhang ... Meixia Tao
IEEE Open Journal of the Communications Society | VOL. 3
Naifu Zhang, et. al.Naifu Zhang ... Meixia Tao
01 Jan 2021
IEEE Open Journal of the Communications Society | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Efficient Distributed Data Management Method based key Columns Partition Preprocessing

Abstract

Talk to us

Similar Papers

More From: International Journal of Database Theory and Application