An algorithm for discovering the frequent closed itemsets in a large database

Ningthoujam Gourakishwar Singh,Sanasam Ranbir Singh,Anjana K Mahanta,Bhanu Prasad

doi:10.1080/09528130600975758

Abstract

Previous research revealed that the problem of discovering a complete set of frequent itemsets from a large database can be reduced to the problem of discovering the frequent closed itemsets, and this process results in a much smaller set of itemsets without information loss. This article is based on the observation that the set of all itemsets can be grouped into non-overlapping clusters such that each cluster is identified by a unique closed tidset. It is also found that there is only one closed itemset in each cluster and it is the superset of all itemsets with the same support. Therefore, the problem of discovering closed itemsets can be further considered as the problem of clustering the set of itemsets and then identifying each cluster by a unique closed tidset. This article presents CloseMiner, a new algorithm for discovering all frequent closed itemsets by grouping the set of itemsets into non-overlapping clusters. Experimental evaluation based on a number of real and synthetic databases has proved that CloseMiner outperforms the existing systems APRIORI and CHARM.

Full Text