Rare itemsets have been studied less extensively than frequent itemsets, but have important potential applications in black swan events, like detecting anomalies. Mining rare itemsets poses two challenges: too many results may be obtained, and the process may incur a high computational overhead. To overcome these two challenges, we can attempt to mine minimal rare itemsets (MRIs) and use heuristic methods to mine approximate results instead of exact results. This paper describes a novel algorithm for mining MRIs using cross-entropy (CE). We present the modeling method for MRI-CE and introduce a progressive checking strategy that enables more MRIs to be discovered in each iteration. The discovered MRIs are then used to update a probability vector. We design two optimization strategies to improve the algorithm's performance. The adaptive sample size strategy narrows the search space as the number of iterations increases, and the crossover-based individual generation strategy improves the diversity of the samples. To evaluate the performance of MRI-CE, we select six competitive algorithms and conduct extensive experiments on publicly available datasets. The results show that the proposed algorithm is not only efficient, but also highly accurate. Furthermore, we verify the effectiveness of the two optimization strategies experimentally.
Read full abstract