Improving semantic compression specification in large relational database

Saad Mohamed Darwish

doi:10.1049/iet-sen.2015.0054

Abstract

The large-scale relational databases normally have a large size and a high degree of sparsity. This has made database compression very important to improve the performance and save storage space. Using standard compression techniques (syntactic) such as Gzip or Zip does not take advantage of the relational properties, as these techniques do not look at the nature of the data. Since semantic compression accounts for and exploits both the meanings and dynamic ranges of error for individual attributes (lossy compression); and existing data dependencies and correlations between attributes in the table (lossless compression), it is very effective for table-data compression. Inspired by semantic compression, this study proposes a novel independent lossless compression system through utilising data-mining model to find the frequent pattern with maximum gain (representative row) in order to draw attribute semantics, besides a modified version of an augmented vector quantisation coder to increase total throughput of the database compression. This algorithm enables more granular and suitable for every kind of massive data tables after synthetically considering compression ratio, space, and speed. The experimentation with several very large real-life datasets indicates the superiority of the system with respect to previously known lossless semantic techniques.

Full Text