Result Size Estimation Research Articles

공간 질의 크기에 대한 근사치를 구하기 위해서는 입력 데이터 공간을 분할한 후 분할된 영역에 대하여 질의 결과 크기를 추정한다. 본 논문에서는 데이터 편재가 심한 공간 데이터에 대한 질의 크기 추정의 문제를 논의한다. 공간을 분할하는 기법으로 관계 데이터베이스에서 많이 사용되는 너비 균등, 높이 균등 히스토그램에 해당되는 면적 균등, 개수 균등 분할에 대한 방법을 검토하고 공간 인덱싱에 기초한 공간 분할방법에 대해서 알아본다. 본 논문에서는 공간 순서화 기법인 힐버트 공간 채움 곡선을 이용한 공간 분할을 제안한다. 제안한 방법과 기존의 방법을 실제 데이터와 인위 데이터를 사용하여 편재된 공간 데이터에 대한 질의 결과 크기의 추정에 대한 정확도를 비교한다. 본 실험에서 힐버트 채움 곡선에 의한 공간 분할이 공간 질의 크기 버켓 수의 변화, 데이터 위치 편재도의 변화, 데이터 크기의 변화에 대해서 기존의 분할 방법보다 질의 결과 크기 추정에 대해서 우수한 성능을 보였다. In order to approximate the spatial query result size we partition the input rectangles into subsets and estimate the query result size based on the partitioned spatial area. In this paper we examine query result size estimation in skewed data. We examine the existing spatial partitioning techniques such as equi-area and equi-count partitioning, which are analogous to the equi-width and equi-height histograms used in relational databases, and examine the other partitioning techniques based on spatial indexing. In this paper we propose a new spatial partitioning technique based on the Hilbert space filling curve. We present a detailed experimental evaluation comparing the proposed technique and the existing techniques using synthetic as well as real-life datasets. The experiments showed that the proposed partitioning technique based on the Hilbert space filling curve achieves better query result size estimation than the existing techniques for space query size, bucket numbers, skewed data, and spatial data size.

Read full abstract

Benchmarking is an important phase in developing any new software technique because it helps to validate the underlying theory in the specific problem domain. But benchmarking of new software strategies is a very complex problem, because it is difficult (if not impossible) to test, validate and verify the results of the various schemes in completely different settings. This is even more true in the case of database systems because the benchmarking also depends on the types of queries presented to the databases used in the benchmarking experiments. Query optimization strategies in relational database systems rely on approximately estimating the query result sizes to minimize the response time for user-queries. Among the many query result size estimation techniques, the histogram-based techniques are by far the most commonly used ones in modern-day database systems. These techniques estimate the query result sizes by approximating the underlying data distributions, and, thus, are prone to estimation errors. In two recent works , we proposed (and thoroughly analyzed) two new forms of histogram-like techniques called the rectangular and trapezoidal attribute cardinality maps (ACM), respectively, that give much smaller estimation errors than the traditional equi-width and equi-depth histograms currently being used by many commercial database systems. This paper reports how the benchmarking of the Rectangular-ACM (R-ACM) and the Trapezoidal-ACM (T-ACM) for query optimization can be achieved. By conducting an extensive set of experiments using the acclaimed TPC-D benchmark queries and database , we demonstrate that these new ACM schemes are much more accurate than the traditional histograms for query result size estimation. Apart from demonstrating the power of the ACMs, this paper also shows how the TPC-D benchmarking can be achieved using a large synthetic database with many different patterns of synthetic queries, which are representative of a real-world business environment.

Read full abstract

Result Size Estimation Research Articles

Articles published on Result Size Estimation

A Correlated Network Scale-Up Model: Finding the Connection Between Subpopulations

Particle motion artifacts in equilibrium magnetization measurements of large iron oxide nanoparticles

A mobile angular scattering microscope for organelle size estimation.

Using multi-frequency acoustic attenuation to monitor grain size and concentration of suspended sediment in rivers

Dye shift: a neglected source of genotyping error in molecular ecology

A Result Size Estimation Algorithm for Value Predication in XML Query

공간 질의 최적화를 위한 힐버트 공간 순서화에 따른 공간 분할

Benchmarking attribute cardinality maps for database systems using the tpc-d specifications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Result Size Estimation Research Articles

Articles published on Result Size Estimation

A Correlated Network Scale-Up Model: Finding the Connection Between Subpopulations

Particle motion artifacts in equilibrium magnetization measurements of large iron oxide nanoparticles

A mobile angular scattering microscope for organelle size estimation.

Using multi-frequency acoustic attenuation to monitor grain size and concentration of suspended sediment in rivers

Dye shift: a neglected source of genotyping error in molecular ecology

A Result Size Estimation Algorithm for Value Predication in XML Query

공간 질의 최적화를 위한 힐버트 공간 순서화에 따른 공간 분할

Benchmarking attribute cardinality maps for database systems using the tpc-d specifications