CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams

Se Jung Shin,Dae Su Lee,Won Suk Lee

doi:10.1016/j.ins.2014.03.074

Abstract

Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Mar 24, 2014
Citations: 24

Similar Papers

A false negative approach to mining frequent itemsets from high speed transactional data streams
Jeffrey Xu Yu ... Aoying Zhou
Information Sciences | VOL. 176
Jeffrey Xu Yu, et. al.Jeffrey Xu Yu ... Aoying Zhou
29 Nov 2005
Information Sciences | VOL. 176

Effect of Count Estimation in Finding Frequent Itemsets over Online Transactional Data Streams
Joong Hyuk Chang ... Won Suk Lee
Journal of Computer Science and Technology | VOL. 20
Joong Hyuk Chang, et. al.Joong Hyuk Chang ... Won Suk Lee
01 Jan 2004
Journal of Computer Science and Technology | VOL. 20

The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors
Daejoon Joo ... Ingoo Han
Expert Systems with Applications | VOL. 25
Daejoon Joo, et. al.Daejoon Joo ... Ingoo Han
06 Feb 2003
Expert Systems with Applications | VOL. 25

Finding frequent itemsets over online data streams
Joong Hyuk Chang ... Won Suk Lee
Information and Software Technology | VOL. 48
Joong Hyuk Chang, et. al.Joong Hyuk Chang ... Won Suk Lee
09 Aug 2005
Information and Software Technology | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams

Abstract

Talk to us

Similar Papers

More From: Information Sciences