Efficient frequent itemsets mining through sampling and information granulation

Zhongjie Zhang,Witold Pedrycz,Jian Huang

doi:10.1016/j.engappai.2017.07.016

Abstract

In this study, we propose an algorithm forming high quality approximate frequent itemsets from those datasets with a large scale of transactions. The results produced by the algorithm with high probability contain all frequent itemsets, no itemset with support much lower than the minimum support is included, and supports obtained by the algorithm are close to the real values. To avoid an over-estimated sample size and a significant computing overhead, the task of reducing data is decomposed into three subproblems, and sampling and information granulation are used to solve them one by one. Firstly, the algorithm obtains rough support of every item by sampling and removes those infrequent items, so the data are simplified. Then, another sample is taken from the simplified data, and is clustered into some information granules. After data reduction, these granules obtained in this way are mined by the improved Apriori. A tight guarantee for the quality of final results is provided. The performance of the approach is quantified through a series of experiments.

Full Text