Abstract

Tables are two-dimensional arrays given in row-major order. Such data have unique features that could be exploited for effective compression. For example, tables often represent database files with rows as records so certain columns or fields in a table may have few distinct values. This means that simply transposing the data can make it compress better. Further, a large source of information redundancy in a table is the correlation among columns representing related types of data. This paper formalizes the notion of column dependency as a way to capture this information redundancy across columns and discusses how to automatically compute and use it to substantially improve table compression.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call