Symbolic Minimization on Relational Data

Ruoyu Wang,Raymond Wong,Daniel Sun

doi:10.1109/tkde.2022.3222827

Abstract

The current wave of AI is heavily driven by data, especially for cognitive capabilities. Minimization of data semantics not only reveals core information but also becomes a guide in a wide range of domains. However, scalability is theoretically weak in pure semantic methodologies. In order to cooperate with large DBs, expressiveness is over-sacrificed in existing techniques. Thus, the quality of discovered patterns and redundancies are far from satisfactory. In this article, we formalize symbolic minimization on relational DBs and prove its NP-Completeness. A lossless technique is proposed by inducing generic first-order Horn rules that infer a subset of records from the others. More importantly, we further improve the scalability via effective caching and pruning without sacrificing the expressiveness of first-order Horn rules. A concrete system is implemented and comprehensively evaluated. Experiments show that our technique removes up to 70% contents and outperforms the state-of-the-art on minimization and scalability. The optimizations reduce up to 96% memory consumption and accelerate the performance by two orders. Our technique shows the practicality of pure semantic approaches in database mining.

Full Text