Abstract

Data compression has been widely adopted in the industry to reduce storage or bandwidth consumption by removing redundant data or encoding information. Redundancy in semantics implies that some facts in a knowledge base can be inferred from the others. For relational databases, it is possible to remove records due to semantic equivalence. In this paper, we present a purely semantic approach, which losslessly compresses relational data in the first place and also enhances data file compression to further reduce the storage. Our Semantic Inductive Compressor (SInC) works not only for intra-relation patterns but also inter-relation cases. SInC achieves around 1/3 to 2/3 of semantic compression ratios, and the original data can be entirely retrieved with the informative patterns induced by SInC. We apply industrial data compression tools on semantically compressed databases, and the experiment results indicate an enhanced compression ratio up to 35%. Almost all efforts in our technique turn to the enhancement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call