Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure.

Marco Saerens,Christine Decaestecker,Patrice Latinne

doi:10.1162/089976602753284446

Abstract

It sometimes happens (for instance in case control studies) that a classifier is trained on a data set that does not reflect the true a priori probabilities of the target classes on real-world data. This may have a negative effect on the classification accuracy obtained on the real-world data set, especially when the classifier's decisions are based on the a posteriori probabilities of class membership. Indeed, in this case, the trained classifier provides estimates of the a posteriori probabilities that are not valid for this real-world data set (they rely on the a priori probabilities of the training set). Applying the classifier as is (without correcting its outputs with respect to these new conditions) on this new data set may thus be suboptimal. In this note, we present a simple iterative procedure for adjusting the outputs of the trained classifier with respect to these new a priori probabilities without having to refit the model, even when these probabilities are not known in advance. As a by-product, estimates of the new a priori probabilities are also obtained. This iterative algorithm is a straightforward instance of the expectation-maximization (EM) algorithm and is shown to maximize the likelihood of the new data. Thereafter, we discuss a statistical test that can be applied to decide if the a priori class probabilities have changed from the training set to the real-world data. The procedure is illustrated on different classification problems involving a multilayer neural network, and comparisons with a standard procedure for a priori probability estimation are provided. Our original method, based on the EM algorithm, is shown to be superior to the standard one for a priori probability estimation. Experimental results also indicate that the classifier with adjusted outputs always performs better than the original one in terms of classification accuracy, when the a priori probability conditions differ from the training set to the real-world data. The gain in classification accuracy can be significant.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure.

Abstract

Talk to us

Similar Papers

More From: Neural Computation

Lead the way for us

Journal: Neural Computation	Publication Date: Jan 1, 2002
Citations: 303

Similar Papers

Comparisons of Two Methods for Haplotype Reconstruction and Haplotype Frequency Estimation from Population Data
Shuanglin Zhang ... Hongyu Zhao
The American Journal of Human Genetics | VOL. 69
Shuanglin Zhang, et. al.Shuanglin Zhang ... Hongyu Zhao
01 Oct 2001
The American Journal of Human Genetics | VOL. 69

Stochastic Dynamic Modeling of Short Gene Expression Time-Series Data
Zidong Wang* ... Stephen Swift
IEEE Transactions on NanoBioscience | VOL. 7
Zidong Wang*, et. al.Zidong Wang* ... Stephen Swift
01 Mar 2008
IEEE Transactions on NanoBioscience | VOL. 7

Expectation-maximization algorithms for learning a finite mixture of univariate survival time distributions from partially specified class values
Youngrok Lee
-
Youngrok LeeYoungrok Lee
15 May 2013
15 May 2013

Expectation-maximization algorithms for learning a finite mixture of univariate survival time distributions from partially specified class values
Youngrok Lee
-
Youngrok LeeYoungrok Lee
24 Sep 2013
24 Sep 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure.

Abstract

Talk to us

Similar Papers

More From: Neural Computation