Mining Approximate Frequency Itemsets over Data Streams Based on D-Hash Table

Chunhua Ju,Gang You

doi:10.1109/snpd.2009.29

Abstract

Frequent itemsets (or frequent pattern) mining, which is the basic step during data stream mining, has been paid more and more attention by researchers. Because of the uncertainties and continuities of data stream, the time-efficiency and space-efficiency of many mining algorithms are unaccepted. In this paper, hashed table is introduced to represent the synoptic data structure. By this way, the memory footprints in Lossy Counting algorithms can be reduced. In addition, the algorithm of frequent itemsets mining based on D-Hashed Table (MFS-HT for short) is proposed to obtain the items whose frequency count exceeded a user-specified threshold in data streams. Comparing with Lossy Counting and a similar algorithm called Mining Frequent Item sets over data Streams by Matrix (MISM for short), the experiment result shows that MFS-HT is more effective both in time and space efficiency.

Full Text