Operative assessment of predicted generalization errors on non-stationary distributions in data-intensive applications

Sergio Decherchi,Fabio Sangiacomo,Paolo Gastaldo,Alessio Leoncini,Rodolfo Zunino

doi:10.3233/ida-2010-0463

Abstract

Data-intensive applications use empirical methods to extract consistent information from huge samples. When applied to classification tasks, their aim is to optimize accuracy on unseen data hence a reliable prediction of the generalization error is of paramount importance. Theoretical models, such as Statistical Learning Theory, and empirical estimations, such as cross-validation, can both fit data-mining classification domains very well, provided some crucial assumptions are verified in advance. In particular, the stationary distribution of the observed data is critical, although it is sometimes overlooked in practice. The paper formulates an operative criterion to verify the stationary assumption; the method applies to both theoretical and practical predictions of generalization errors. The analysis addresses the specific case of clustering-based classifiers; the K-Winner Machine (KWM) model is used as a reference for its known theoretical bounds; cross-validation provides an empirical counterpart for practical comparison. The criterion, based on efficient unsupervised clustering-based probability distribution estimation, is tested experimentally on a set of different, data-intensive applications, including: intrusion detection for computer-network security, optical character recognition, text mining and pedestrian detection. Experimental results confirm the effectiveness of the proposed approach to efficiently detect non stationarity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Operative assessment of predicted generalization errors on non-stationary distributions in data-intensive applications

Abstract

Talk to us

Similar Papers

More From: Intelligent Data Analysis

Lead the way for us

Journal: Intelligent Data Analysis	Publication Date: Mar 11, 2011
Citations: 20

Similar Papers

Text detection and recognition in images and video sequences

-

01 Jan 2003
01 Jan 2003

Statistical Learning Theory of Quasi-Regular Cases
Koshi Yamada ... Sumio Watanabe
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | VOL. E95.A
Koshi Yamada, et. al.Koshi Yamada ... Sumio Watanabe
01 Jan 2012
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | VOL. E95.A

Algebraic Geometry and Statistical Learning Theory
Sumio Watanabe
-
Sumio WatanabeSumio Watanabe
13 Aug 2009
13 Aug 2009

XtremDew: a platform for cooperative tasks and data schedulers
Mohamed Labidi ... Gilles Fedak
International Journal of High Performance Computing and Networking | VOL. 16
Mohamed Labidi, et. al.Mohamed Labidi ... Gilles Fedak
01 Jan 2020
International Journal of High Performance Computing and Networking | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Operative assessment of predicted generalization errors on non-stationary distributions in data-intensive applications

Abstract

Talk to us

Similar Papers

More From: Intelligent Data Analysis