Abstract

Several propositional satisfiability (SAT) based encodings have been proposed to deal with various data mining problems including itemset and sequence mining problems. This research issue allows to model data mining problems in a declarative way, while exploiting efficient SAT solving techniques. In this paper, we overview our contributions on the application of propositional satisfiability (SAT) to model and solve itemset mining tasks. We first present a SAT based encoding of frequent closed itemset mining task as a propositional formula whose models corresponds to the patterns to be mined. Secondly, we show that some data mining constraints can be avoided by reformulation. We illustrate this issue by reformulating the closeness constraint using the notion of minimal models. Finally, we also addressed the scalability issue, one of the most important challenge of these nice declarative framework. To this end, we proposed a complete partition based approach whose aim is to avoid encoding the whole database as a single formula. Using a partition on the set of items, our new approach leads to several propositional formulas of reasonable size. The experimental evaluation on several known datasets shows huge improvements in comparison to the direct approach without partitioning, while reducing significantly the performance gap with respect to specialized algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call