Abstract

The current wave of AI is heavily driven by data, especially for cognitive capabilities. Minimization of data semantics not only reveals core information but also becomes a guide in a wide range of domains. However, scalability is theoretically weak in pure semantic methodologies. In order to cooperate with large DBs, expressiveness is over-sacrificed in existing techniques. Thus, the quality of discovered patterns and redundancies are far from satisfactory. In this article, we formalize symbolic minimization on relational DBs and prove its NP-Completeness. A lossless technique is proposed by inducing generic first-order Horn rules that infer a subset of records from the others. More importantly, we further improve the scalability via effective caching and pruning without sacrificing the expressiveness of first-order Horn rules. A concrete system is implemented and comprehensively evaluated. Experiments show that our technique removes up to 70% contents and outperforms the state-of-the-art on minimization and scalability. The optimizations reduce up to 96% memory consumption and accelerate the performance by two orders. Our technique shows the practicality of pure semantic approaches in database mining.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call