DISSparse: Efficient Mining of Discriminative Itemsets

Majid Seyfi,Richi Nayak,Shlomo Geva,Yue Xu

doi:10.1142/s0219649222500095

Majid Seyfi, Richi Nayak + Show 2 more

Open Access

https://doi.org/10.1142/s0219649222500095

Copy DOI

Abstract

We tackle the problem of discriminative itemset mining. Given a set of datasets, we want to find the itemsets that are frequent in the target dataset and have much higher frequencies compared with the same itemsets in other datasets. Such itemsets are very useful for dataset discrimination. We demonstrate that this problem has important applications and, at a same time, is very challenging. We present the DISSparse algorithm, a mining method that uses two determinative heuristics based on the sparsity characteristics of the discriminative itemsets as a small subset of the frequent itemsets. We prove that the DISSparse algorithm is sound and complete. We experimentally investigate the performance of the proposed DISSparse on a range of datasets, evaluating its efficiency and stability and demonstrating it is substantially faster than the baseline method.

Full Text