적응적 격자기반 다차원 데이터 스트림 클러스터링 방법

Nam-Hun Park,Won-Suk Lee

doi:10.3745/kipstd.2007.14-d.7.733

Abstract

데이터 스트림이란, 빠른 속도로 지속적으로 생성되는 무한한 크기의 방대한 양의 데이터 집합으로 정의된다. 무한한 데이터 스트림에 비해 주어진 메모리 공간은 유한하게 한정되어 있어, 이러한 제약조건을 충족시키는 범위 내에서 일정 한도내의 정확도 오차를 허용하기도 한다. 또한, 변화하는 데이터 스트림 내의 최신 클러스터를 찾기 위해서는 데이터 객체의 저장없이 오래된 데이터 스트림 내의 정보들을 비중을 감소시킬 수 있어야 한다. 본 연구에서는 데이터 스트림 분석을 위한 데이터 스트림 격자 기반 클러스터링 기법을 제시한다. 주어진 초기 격자셀에 대해, 데이터 객체의 빈도가 높은 범위를 반복적으로 보다 작은 크기의 격자셀로 분할하여 최소 크기의 격자셀, 단위 격자셀을 생성한다. 격자 셀에서는 데이터 객체들의 분포에 대한 통계값만을 저장하여, 기존의 클러스터링 기법에 비해 데이터 객체에 대한 탐색없이 효율적으로 클러스터를 찾을 수 있다. 또한, 가용 메모리 공간에 따라 단위 격자셀의 크기를 조절하여 클러스터의 정확도를 최대화할 수 있어, 주어진 메모리 공간에 맞게 적응적으로 성능을 조절할 수 있다. A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, memory usage for data stream analysis should be confined finitely although new data elements are continuously generated in a data stream. To satisfy this requirement, data stream processing sacrifices the correctness of its analysis result by allowing some errors. The old distribution statistics are diminished by a predefined decay rate as time goes by, so that the effect of the obsolete information on the current result of clustering can be eliminated without maintaining any data element physically. This paper proposes a grid based clustering algorithm for a data stream. Given a set of initial grid cells, the dense range of a grid cell is recursively partitioned into a smaller cell based on the distribution statistics of data elements by a top down manner until the smallest cell, called a unit cell, is identified. Since only the distribution statistics of data elements are maintained by dynamically partitioned grid cells, the clusters of a data stream can be effectively found without maintaining the data elements physically. Furthermore, the memory usage of the proposed algorithm is adjusted adaptively to the size of confined memory space by flexibly resizing the size of a unit cell. As a result, the confined memory space can be fully utilized to generate the result of clustering as accurately as possible. The proposed algorithm is analyzed by a series of experiments to identify its various characteristics

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

적응적 격자기반 다차원 데이터 스트림 클러스터링 방법

Abstract

Talk to us

Similar Papers

More From: The KIPS Transactions:PartD

Lead the way for us

Journal: The KIPS Transactions:PartD	Publication Date: Dec 31, 2007
Citations: 15

Similar Papers

Statistical grid-based clustering over data streams
Nam Hun Park ... Won Suk Lee
ACM SIGMOD Record | VOL. 33
Nam Hun Park, et. al.Nam Hun Park ... Won Suk Lee
01 Mar 2004
ACM SIGMOD Record | VOL. 33

Finding recently frequent itemsets adaptively over online transactional data streams ,
Joong Hyuk Chang ... Won Suk Lee
Information Systems | VOL. 31
Joong Hyuk Chang, et. al.Joong Hyuk Chang ... Won Suk Lee
31 May 2005
Information Systems | VOL. 31

A Statistical μ-Partitioning Method for Clustering Data Streams
Nam Hun Park ... Won Suk Lee
-
Nam Hun Park, et. al.Nam Hun Park ... Won Suk Lee
01 Jan 2003
01 Jan 2003

개방 데이터 마이닝에 효율적인 이동 윈도우 기법
Joong-Hyuk Chang ... Won-Suk Lee
The KIPS Transactions:PartD | VOL. 12D
Joong-Hyuk Chang, et. al.Joong-Hyuk Chang ... Won-Suk Lee
01 Jun 2005
The KIPS Transactions:PartD | VOL. 12D

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

적응적 격자기반 다차원 데이터 스트림 클러스터링 방법

Abstract

Talk to us

Similar Papers

More From: The KIPS Transactions:PartD