HEDC++: An Extended Histogram Estimator for Data in the Cloud

Ying-Jie Shi,Yan-Tao Gan,Xiao-Feng Meng,Fusheng Wang

doi:10.1007/s11390-013-1392-7

Abstract

With increasing popularity of cloud-based data management, improving the performance of queries in the cloud is an urgent issue to solve. Summary of data distribution and statistical information has been commonly used in traditional databases to support query optimization, and histograms are of particular interest. Naturally, histograms could be used to support query optimization and efficient utilization of computing resources in the cloud. Histograms could provide helpful reference information for generating optimal query plans, and generate basic statistics useful for guaranteeing the load balance of query processing in the cloud. Since it is too expensive to construct an exact histogram on massive data, building an approximate histogram is a more feasible solution. This problem, however, is challenging to solve in the cloud environment because of the special data organization and processing mode in the cloud. In this paper, we present HEDC++, an extended histogram estimator for data in the cloud, which provides efficient approximation approaches for both equi-width and equi-depth histograms. We design the histogram estimate workflow based on an extended MapReduce framework, and propose novel sampling mechanisms to leverage the sampling efficiency and estimate accuracy. We experimentally validate our techniques on Hadoop and the results demonstrate that HEDC++ can provide promising histogram estimate for massive data in the cloud.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HEDC++: An Extended Histogram Estimator for Data in the Cloud

Abstract

Talk to us

Similar Papers

More From: Journal of Computer Science and Technology

Lead the way for us

Journal: Journal of Computer Science and Technology	Publication Date: Nov 1, 2013
Citations: 11

Similar Papers

HEDC
Yingjie Shi ... Yantao Gan
-
Yingjie Shi, et. al.Yingjie Shi ... Yantao Gan
29 Oct 2012
29 Oct 2012

Auditor judgment and decision-making in big data environment: a proposed research framework
Adli Hamdam ... Yazkhiruni Yahya
Accounting Research Journal | VOL. 35
Adli Hamdam, et. al.Adli Hamdam ... Yazkhiruni Yahya
26 Jan 2021
Accounting Research Journal | VOL. 35

A comparison of selectivity estimators for range queries on metric attributes
Björn Blohsfeld ... Dieter Korus
-
Björn Blohsfeld, et. al.Björn Blohsfeld ... Dieter Korus
01 Jun 1999
01 Jun 1999

A comparison of selectivity estimators for range queries on metric attributes
Björn Blohsfeld ... Dieter Korus
ACM SIGMOD Record | VOL. 28
Björn Blohsfeld, et. al.Björn Blohsfeld ... Dieter Korus
01 Jun 1999
ACM SIGMOD Record | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HEDC++: An Extended Histogram Estimator for Data in the Cloud

Abstract

Talk to us

Similar Papers

More From: Journal of Computer Science and Technology