Streamed Sampling on Dynamic data as Support for Classification Model

Astried Silvanie,Taufik Djatna,Heru Sukoco

doi:10.12928/telkomnika.v11i4.1210

Abstract

Data mining process on dynamically changing data have several problems, such as unknown data size and changing of class distribution . Random sampling method commonly applied for extracting general synopsis from very large database. In this research, Vitter’s reservoir algorithm is used to retrieve k records of data from the database and put into the sample. Sample is used as input for classification task in data mining. Sample type is backing sample and it saved as table contains value of id, priority and timestamp. Priority indicates the probability of how long data retained in the sample. Kullback-Leibler divergence applied to measure the similarity between database and sample distribution. Result of this research is showed that continuously taken samples randomly is possible when transaction occurs. Kullback-Leibler divergence with interval from 0 to 0.0001, is a very good measure to maintain similar class distribution between database and sample. Sample results are always up to date on new transactions with similar class distribution. Classifier built from balance class distribution showed to have better performance than from imbalance one.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: TELKOMNIKA (Telecommunication Computing Electronics and Control)	Publication Date: Dec 1, 2013
Citations: 8	License type: cc-by-sa

R Discovery Prime

R Discovery Prime

Streamed Sampling on Dynamic data as Support for Classification Model

Abstract

Talk to us

Similar Papers

More From: TELKOMNIKA (Telecommunication Computing Electronics and Control)

Lead the way for us

Similar Papers

The Application of the Ant Colony Decision Rule Algorithm on Distributed Data Mining
Linquan Xie ... Hongbiao Mei
Communications of the IIMA | VOL. 7
Linquan Xie, et. al.Linquan Xie ... Hongbiao Mei
01 Jun 2014
Communications of the IIMA | VOL. 7

Integrating Weight with Ensemble to Handle Changes in Class Distribution
Nachai Limsetto ... Kitsana Waiyamai
-
Nachai Limsetto, et. al.Nachai Limsetto ... Kitsana Waiyamai
01 Jan 2014
01 Jan 2014

Towards an optimal feature subset selection
O.A Shiba ... M.N Sulaiman
-
O.A Shiba, et. al.O.A Shiba ... M.N Sulaiman
25 Aug 2003
25 Aug 2003

Data Mining
Gisele L Pappa ... Alex A Freitas
-
Gisele L Pappa, et. al.Gisele L Pappa ... Alex A Freitas
28 Oct 2009
28 Oct 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Streamed Sampling on Dynamic data as Support for Classification Model

Abstract

Talk to us

Similar Papers

More From: TELKOMNIKA (Telecommunication Computing Electronics and Control)