Abstract

Many pattern mining tasks have been modeled and solved using constraints programming (CP) and propositional satisfiability (SAT). In these two well-known declarative AI models, the problem is encoded as a constraints network or a propositional formula, whose associated models correspond to the patterns of interest. In this new declarative framework, new user-specified constraints can be easily integrated, while in traditional data mining, such additional constraints might require an implementation from scratch. Unfortunately, these declarative data mining approaches do not scale on large datasets, leading to huge size encodings. In this paper, we propose a compact SAT-based encoding for itemset mining tasks, by rewriting some key-constraints. We prove that this reformulation can be expressed as a Boolean matrix compression problem. To address this problem, we propose a greedy approach allowing us to reduce considerably the size of the encoding while improving the pattern enumeration step. Finally, we provide experimental evidence that our proposed approach achieves a significant reduction in the size of the encoding. These results show interesting improvements of this compact SAT-based itemset mining approach while reducing significantly the gap with the best state-of-the-art specialized algorithm.KeywordsData miningItemset miningSatisfiability

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call