A data mining proxy approach for efficient frequent itemset mining

Jeffrey Xu Yu,Zhiheng Li,Guimei Liu

doi:10.1007/s00778-007-0047-0

Abstract

Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper, we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns, because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently. We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A data mining proxy approach for efficient frequent itemset mining

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal

Lead the way for us

Journal: The VLDB Journal	Publication Date: Apr 12, 2007
Citations: 49

Similar Papers

Towards data mining benchmarking
Jian Pei ... Kan Hu
ACM SIGMOD Record | VOL. 29
Jian Pei, et. al.Jian Pei ... Kan Hu
16 May 2000
ACM SIGMOD Record | VOL. 29

Efficient representative pattern mining based on weight and maximality conditions
Unil Yun ... Gangin Lee
Expert Systems | VOL. 33
Unil Yun, et. al.Unil Yun ... Gangin Lee
28 Jun 2016
Expert Systems | VOL. 33

D-colSimulation: A Distributed Approach for Frequent Graph Pattern Mining based on colSimulation in a Single Large Graph
Guanqi Hua ... Wei He
-
Guanqi Hua, et. al.Guanqi Hua ... Wei He
01 Nov 2020
01 Nov 2020

Closed frequent similar pattern mining: Reducing the number of frequent similar patterns without information loss
Ansel Y Rodríguez-González ... Enrique Munoz De Cote
Expert Systems With Applications | VOL. 96
Ansel Y Rodríguez-González, et. al.Ansel Y Rodríguez-González ... Enrique Munoz De Cote
09 Dec 2017
Expert Systems With Applications | VOL. 96

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A data mining proxy approach for efficient frequent itemset mining

Abstract

Talk to us

Similar Papers

More From: The VLDB Journal