The mining of frequent itemsets and association rules is a core problem in data mining and an essential task in data analysis. In this paper, we present SufRec, a new algorithm for finding frequent itemsets and association rules. We give two versions. In both versions, the mining of the frequent itemsets is decomposed as a sequence of tasks. The first version (the SufRecDep algorithm) proceeds the tasks successively, each one using the results of the previous ones. The second version (the SufRecInd algorithm) performs the tasks independently of each other. Both versions are recursive with respect to the items and have thus the advantage to be particularly efficient for updating the mining process when new items are added to the database or when others are excluded. Moreover, the task-independent processing of SufRecInd makes it very easy to build parallel versions of SufRec. We present two parallel SufRec algorithms. With the first one, P-FIFO-SufRec, the processors perform the tasks according to the order in which the items appear in the database, preserving the recursive nature of SufRec. For second one, P-PAST-SufRec, the set of items is fixed and the tasks are pre-assigned to each processor. In order to evaluate the performance of SufRec, we carry out a consequent experimental study. In particular, we compare its running times with those of well-known parallel and non-parallel frequent itemsets mining algorithms.
Read full abstract