Automatic tuning of data synopses

Arnd Christian König,Gerhard Weikum

doi:10.1016/s0306-4379(02)00050-9

Abstract

Maintaining statistics on multidimensional data distributions is crucial for predicting the run-time and result size of queries and data analysis tasks with acceptable accuracy. Applications of such predictions include traditional query optimization, priority management and resource scheduling for data mining tasks, as well as querying heterogeneous Web data sources with diverse information quality. To this end a plethora of techniques have been proposed for maintaining a compact data “synopsis” on a single table, ranging from variants of histograms to methods based on wavelets and other transforms. However, the fundamental question of how to reconcile the synopses for large information sources with many tables has been largely unexplored. This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources. It shows how to compute an optimal combination of synopses for a given workload and a limited amount of available memory. As the exact solution has large computational complexity, efficient heuristics are presented for limiting the search space of synopses combinations. The practicality of the approach and the accuracy of the proposed heuristics are demonstrated by experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic tuning of data synopses

Abstract

Talk to us

Similar Papers

More From: Information Systems

Lead the way for us

Journal: Information Systems	Publication Date: Dec 5, 2002
Citations: 5

Similar Papers

A Framework for the Physical Design Problem for Data Synopses
Arnd Christian König ... Gerhard Weikum
-
Arnd Christian König, et. al.Arnd Christian König ... Gerhard Weikum
01 Jan 2002
01 Jan 2002

Dynamic maintenance of data distribution for selectivity estimation
Kyu-Young Whang ... Sang-Wook Kim
The VLDB Journal | VOL. 3
Kyu-Young Whang, et. al.Kyu-Young Whang ... Sang-Wook Kim
01 Jan 1993
The VLDB Journal | VOL. 3

Task Scheduling Mechanism Based on Reinforcement Learning in Cloud Computing
Yugui Wang ... Shizhong Dong
Mathematics | VOL. 11
Yugui Wang, et. al.Yugui Wang ... Shizhong Dong
01 Aug 2023
Mathematics | VOL. 11

GIJA:Enhanced geyser‐inspired Jaya algorithm for task scheduling optimization in cloud computing
Laith Abualigah ... Mohammad Sh Daoud
Transactions on Emerging Telecommunications Technologies | VOL. 35
Laith Abualigah, et. al.Laith Abualigah ... Mohammad Sh Daoud
01 Jul 2024
Transactions on Emerging Telecommunications Technologies | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic tuning of data synopses

Abstract

Talk to us

Similar Papers

More From: Information Systems