Abstract

Rare itemsets have been studied less extensively than frequent itemsets, but have important potential applications in black swan events, like detecting anomalies. Mining rare itemsets poses two challenges: too many results may be obtained, and the process may incur a high computational overhead. To overcome these two challenges, we can attempt to mine minimal rare itemsets (MRIs) and use heuristic methods to mine approximate results instead of exact results. This paper describes a novel algorithm for mining MRIs using cross-entropy (CE). We present the modeling method for MRI-CE and introduce a progressive checking strategy that enables more MRIs to be discovered in each iteration. The discovered MRIs are then used to update a probability vector. We design two optimization strategies to improve the algorithm's performance. The adaptive sample size strategy narrows the search space as the number of iterations increases, and the crossover-based individual generation strategy improves the diversity of the samples. To evaluate the performance of MRI-CE, we select six competitive algorithms and conduct extensive experiments on publicly available datasets. The results show that the proposed algorithm is not only efficient, but also highly accurate. Furthermore, we verify the effectiveness of the two optimization strategies experimentally.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.