Horizontal aggregations for building tabular data sets

Carlos Ordonez

doi:10.1145/1008694.1008700

Abstract

In a data mining project, a significant portion of time is devoted to building a data set suitable for analysis. In a relational database environment, building such data set usually requires joining tables and aggregating columns with SQL queries. Existing SQL aggregations are limited since they return a single number per aggregated group, producing one row for each computed number. These aggregations help, but a significant effort is still required to build data sets suitable for data mining purposes, where a tabular format is generally required. This work proposes very simple, yet powerful, extensions to SQL aggregate functions to produce aggregations in tabular form, returning a set of numbers instead of one number per row. We call this new class of functions horizontal aggregations. Horizontal aggregations help building answer sets in tabular form (e.g. point-dimension, observation-variable, instance-feature), which is the standard form needed by most data mining algorithms. Two common data preparation tasks are explained, including transposition/aggregation and transforming categorical attributes into binary dimensions. We propose two strategies to evaluate horizontal aggregations using standard SQL. The first strategy is based only on relational operators and the second one uses the case construct. Experiments with large data sets study the proposed query optimization strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Horizontal aggregations for building tabular data sets

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
...
-
, et. al. ...
21 Jun 2014
21 Jun 2014

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Carlos Ordonez ... Zhibo Chen
IEEE Transactions on Knowledge and Data Engineering | VOL. 24
Carlos Ordonez, et. al.Carlos Ordonez ... Zhibo Chen
01 Apr 2012
IEEE Transactions on Knowledge and Data Engineering | VOL. 24

Workload Optimization by Horizontal Aggregation in SQL for Data Mining Analysis
Prasanna M Rathod ... Prof Dr Anjali B Raut
International Journal of Scientific Research in Computer Science, Engineering and Information Technology | VOL. -
Prasanna M Rathod, et. al.Prasanna M Rathod ... Prof Dr Anjali B Raut
14 Apr 2021
International Journal of Scientific Research in Computer Science, Engineering and Information Technology | VOL. -

Database Transformation to Build Dataset for Generation of Decision Tree and Extended ER Model
Archana A.Chaudhari ... Harmeet Kaur Khanuja
International Journal of Computer Applications | VOL. 118
Archana A.Chaudhari, et. al.Archana A.Chaudhari ... Harmeet Kaur Khanuja
20 May 2015
International Journal of Computer Applications | VOL. 118

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Horizontal aggregations for building tabular data sets

Abstract

Talk to us

Similar Papers