ABSTRACT With the development of online service, recent forms of databas es have been changed from static database structures to dynamic stream database structures. Previous data mining techniques have been used as tools of decision making such as establishment o f marketing strategies and DNA analyses. However, the capability to analyze real-time data more quickly is necessary in the recent interesting areas such as sensor network, robotics, and artific ial intelligence. Landmark window-based frequent pattern mining , one of the stream mining approaches, performs mining operations with respect to parts of databases or each transaction of them, inste ad of all the data. In this paper, we analyze and evaluate the tec hniques of the well-known landmark window-based frequent patter n mining algorithms, called Lossy counting and hMiner. When Lossy counting mines frequent patterns from a set of new transactions, it performs union operations between the previous and current mining results. hMiner, which is a state-of-the-art algorithm based on the landmark window model, conducts mining operations whenever a new transaction occurs. Since hMiner extracts frequent patterns a s soon as a new transaction is entered, we can obtain the latest mining results reflecting real-time information. For this reaso n, such algorithms are also called online mining approaches. We evaluat e and compare the performance of the primitive algorithm, Lossy counting and the latest one, hMiner. As the criteria of our performance analysis, we first consider algorithms’ total runtime and average processing time per transaction. In addition, to compare the ef ficiency of storage structures between them, their maximum memo ry usage is also evaluated. Lastly, we show how stably the two alg orithms conduct their mining works with respect to the database s that feature gradually increasing items. With respect to the evaluat ion results of mining time and transaction processing, hMiner h as higher speed than that of Lossy counting. Since hMiner stores candidat e frequent patterns in a hash method, it can directly access candidate frequent patterns. Meanwhile, Lossy counting stores them in a l attice manner; thus, it has to search for multiple nodes in ord er to access the candidate frequent patterns. On the other hand, hMiner show s worse performance than that of Lossy counting in terms of maximum memory usage. hMiner should have all of the information for candidate frequent patterns to store them to hash’s bucket s, while Lossy counting stores them, reducing their information by using the lattice method. Since the storage of Lossy counting can share items concurrently included in multiple patterns, its memory usage is more efficient than that of hMiner. However, hMiner pres ents better efficiency than that of Lossy counting with respect to scalability evaluation due to the following reasons. If the number of items is
Read full abstract