Mining With Noise Knowledge: Error-Aware Data Mining

Xindong Wu,Xingquan Zhu

doi:10.1109/tsmca.2008.923034

Abstract

Real-world data mining deals with noisy information sources where data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data. Two common practices are to adopt either data cleansing approaches to enhance the data consistency or simply take noisy data as quality sources and feed them into the data mining algorithms. Either way may substantially sacrifice the mining performance. In this paper, we consider an error-aware (EA) data mining design, which takes advantage of statistical error information (such as noise level and noise distribution) to improve data mining results. We assume that such noise knowledge is available in advance, and we propose a solution to incorporate it into the mining process. More specifically, we use noise knowledge to restore original data distributions, which are further used to rectify the model built from noise- corrupted data. We materialize this concept by the proposed EA naive Bayes classification algorithm. Experimental comparisons on real-world datasets will demonstrate the effectiveness of this design.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mining With Noise Knowledge: Error-Aware Data Mining

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans

Lead the way for us

Journal: IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans	Publication Date: Jul 1, 2008
Citations: 137

Similar Papers

Error awareness data mining
Xingquan Zhu ... Xindong Wu
-
Xingquan Zhu, et. al. Xingquan Zhu ... Xindong Wu
10 May 2006
10 May 2006

Performance Comparative in Classification Algorithms Using Real Datasets
Hanuman T Raghava Nm
Journal of Computer Science & Systems Biology | VOL. 02
Hanuman T Raghava NmHanuman T Raghava Nm
01 Jan 2009
Journal of Computer Science & Systems Biology | VOL. 02

Introduction to 3DM: Domain-Oriented Data-Driven Data Mining
Guoyin Wang
-
Guoyin WangGuoyin Wang
17 May 2008
17 May 2008

Mining with Noise Knowledge: Error Aware Data Mining
Xindong Wu
-
Xindong WuXindong Wu
01 Dec 2007
01 Dec 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mining With Noise Knowledge: Error-Aware Data Mining

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans