A performance study of three disk-based structures for indexing and querying frequent itemsets

Guimei Liu,Limsoon Wong,Andre Suchitra

doi:10.14778/2536349.2536351

Abstract

Frequent itemset mining is an important problem in the data mining area. Extensive efforts have been devoted to developing efficient algorithms for mining frequent itemsets. However, not much attention is paid on managing the large collection of frequent itemsets produced by these algorithms for subsequent analysis and for user exploration. In this paper, we study three structures for indexing and querying frequent itemsets: inverted files, signature files and CFP-tree. The first two structures have been widely used for indexing general set-valued data. We make some modifications to make them more suitable for indexing frequent itemsets. The CFP-tree structure is specially designed for storing frequent itemsets. We add a pruning technique based on length-2 frequent itemsets to make it more efficient for processing superset queries. We study the performance of the three structures in supporting five types of containment queries: exact match, subset/superset search and immediate subset/superset search. Our results show that no structure can outperform other structures for all the five types of queries on all the datasets. CFP-tree shows better overall performance than the other two structures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A performance study of three disk-based structures for indexing and querying frequent itemsets

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: May 1, 2013
Citations: 28

Similar Papers

A new algorithm for fast mining frequent itemsets using N-lists
Zhihong Deng ... Zhonghui Wang
Science China Information Sciences | VOL. 55
Zhihong Deng, et. al.Zhihong Deng ... Zhonghui Wang
19 Jul 2012
Science China Information Sciences | VOL. 55

An Efficient Matrix Algorithm for Mining Frequent Itemsets
Zhangyan Xu ... Dongyuan Gu
-
Zhangyan Xu, et. al.Zhangyan Xu ... Dongyuan Gu
01 Dec 2009
01 Dec 2009

Frequent Item Set Mining Algorithm Based on Bit Combination
Jun Lu ... Renpeng Zhao
-
Jun Lu, et. al.Jun Lu ... Renpeng Zhao
01 Apr 2019
01 Apr 2019

Mining frequent itemsets over data streams using efficient window sliding techniques
Hua-Fu Li ... Suh-Yin Lee
Expert Systems with Applications | VOL. 36
Hua-Fu Li, et. al.Hua-Fu Li ... Suh-Yin Lee
15 Dec 2007
Expert Systems with Applications | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A performance study of three disk-based structures for indexing and querying frequent itemsets

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment