Abstract

Frequent itemsets (or frequent pattern) mining, which is the basic step during data stream mining, has been paid more and more attention by researchers. Because of the uncertainties and continuities of data stream, the time-efficiency and space-efficiency of many mining algorithms are unaccepted. In this paper, hashed table is introduced to represent the synoptic data structure. By this way, the memory footprints in Lossy Counting algorithms can be reduced. In addition, the algorithm of frequent itemsets mining based on D-Hashed Table (MFS-HT for short) is proposed to obtain the items whose frequency count exceeded a user-specified threshold in data streams. Comparing with Lossy Counting and a similar algorithm called Mining Frequent Item sets over data Streams by Matrix (MISM for short), the experiment result shows that MFS-HT is more effective both in time and space efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call