SQL Aggregation Research Articles

Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: ? CASE: Exploiting the programming CASE construct; ? SPJ: Based on standard relational algebra operators (SPJ queries); ? PIVOT: Using the PIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT methods exhibit linear scalability, whereas the SPJ method does not. For query optimization the distance computation and nearest cluster in the k-means are based on SQL. Workload balancing is the assignment of work to processors in a way that maximizes application performance. The process of load balancing can be generalized into four basic steps: 1. Monitoring processor load and state; 2. Exchanging workload and state information between processors; 3. Decision making; 4. Data migration. The decision phase is triggered when the load imbalance is detected to calculate optimal data redistribution. In the fourth and last phase, data migrates from overloaded processors to under-loaded ones.

Read full abstract

OLAP is a core functionality in database systems and the performance is crucial to enable on-time decisions. However, OLAP queries are rather time consuming, especially on large datasets, and traditional exact solutions usually cannot meet the high-performance requirement. Recently, approximate query processing (AQP) has been proposed to enable approximate OLAP. However, existing AQP methods have some limitations. First, they may involve unacceptable errors on skewed data (e.g., long-tail distribution). Second, they require to store large amount of data and have no significant performance improvement. Third, they only support a small subset of SQL aggregation queries. To overcome these limitations, we propose a bounded approximate query processing framework ${\mathtt {BAQ}}$ BAQ . Given a predefined error bound and a set of queries, ${\mathtt {BAQ}}$ BAQ judiciously selects high-quality samples from the data to generate a unified synopsis offline, and then uses the synopsis to answer online queries. Compared with existing methods, ${\mathtt {BAQ}}$ BAQ has the following salient features. (1) ${\mathtt {BAQ}}$ BAQ does not need to generate a synopsis for each query while it only generates a unified synopsis, and thus ${\mathtt {BAQ}}$ BAQ has much smaller synopsis. (2) ${\mathtt {BAQ}}$ BAQ achieves much smaller error than existing studies. Specifically, ${\mathtt {BAQ}}$ BAQ can provide deterministic approximate results (i.e., the estimated query results must be within the error bound with 100 percent confidence) for SQL aggregation queries that do not contain selection conditions on numerical columns. For queries with selection conditions on numerical columns, we propose effective grouping-based techniques and the estimated results are also within the error bound in practice. Experimental results on both real and synthetic datasets show that ${\mathtt {BAQ}}$ BAQ significantly outperforms state-of-the-art approaches. For example, on a Microsoft production dataset (a real dataset with synthetic queries), ${\mathtt {BAQ}}$ BAQ has 10-100× improvement on synopsis size and 10-100× improvement on the error compared with state-of-the-art algorithms.

Read full abstract

SQL Aggregation Research Articles

Articles published on SQL Aggregation

Towards an optimized GROUP by abstraction for large-scale machine learning

Workload Optimization by Horizontal Aggregation in SQL for Data Mining Analysis

A linear programming-based framework for handling missing data in multi-granular data warehouses

Aggregate Searchable Encryption With Result Privacy

Bounded Approximate Query Processing

Detecting measurement issues in SQL arithmetic expressions and aggregations

Fixpoint semantics and optimization of recursive Datalog programs with aggregates

A Survey on Preparing Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

Preparing Data Sets by Using Horizontal Aggregations in SQL for Data Mining Analysis

Database Transformation to Build Dataset for Generation of Decision Tree and Extended ER Model

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Exploiting compression and approximation paradigms for effective and efficient online analytical processing over sensor network readings in data grid environments

Efficient Tabular Dataset Preparations by the Aggregations in SQL: A Survey

Computing Structural Statistics by Keywords in Databases

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Percentage Aggregation Functions by Extending SQL

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Divide-and-approximate: a novel constraint push strategy for iceberg cube mining

Foundations of aggregation constraints

Improved query performance with variant indexes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

SQL Aggregation Research Articles

Articles published on SQL Aggregation

Towards an optimized GROUP by abstraction for large-scale machine learning

Workload Optimization by Horizontal Aggregation in SQL for Data Mining Analysis

A linear programming-based framework for handling missing data in multi-granular data warehouses

Aggregate Searchable Encryption With Result Privacy

Bounded Approximate Query Processing

Detecting measurement issues in SQL arithmetic expressions and aggregations

Fixpoint semantics and optimization of recursive Datalog programs with aggregates

A Survey on Preparing Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

Preparing Data Sets by Using Horizontal Aggregations in SQL for Data Mining Analysis

Database Transformation to Build Dataset for Generation of Decision Tree and Extended ER Model

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Exploiting compression and approximation paradigms for effective and efficient online analytical processing over sensor network readings in data grid environments

Efficient Tabular Dataset Preparations by the Aggregations in SQL: A Survey

Computing Structural Statistics by Keywords in Databases

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Percentage Aggregation Functions by Extending SQL

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Divide-and-approximate: a novel constraint push strategy for iceberg cube mining

Foundations of aggregation constraints

Improved query performance with variant indexes