Compressing table data with column dependency

Binh Dao Vo,Kiem-Phong Vo

doi:10.1016/j.tcs.2007.07.016

Binh Dao Vo, Kiem-Phong Vo

https://doi.org/10.1016/j.tcs.2007.07.016

Copy DOI

Journal: Theoretical Computer Science	Publication Date: Jul 27, 2007
Citations: 21	License type: elsevier-specific: oa user license

Affiliation: Columbia University, AT&T (United States)

Abstract

Tables are two-dimensional arrays given in row-major order. Such data have unique features that could be exploited for effective compression. For example, tables often represent database files with rows as records so certain columns or fields in a table may have few distinct values. This means that simply transposing the data can make it compress better. Further, a large source of information redundancy in a table is the correlation among columns representing related types of data. This paper formalizes the notion of column dependency as a way to capture this information redundancy across columns and discusses how to automatically compute and use it to substantially improve table compression.

Full Text