The Resubstitution Estimate

Luc Devroye,Gábor Lugosi,László Györfi

doi:10.1007/978-1-4612-0711-5_23

Abstract

Estimating the error probability is of primordial importance for classifier selection. The method explored in the previous chapter attempts to solve this problem by using a testing sequence to obtain a reliable holdout estimate. The independence of testing and training sequences leads to a rather straightforward analysis. For a good performance, the testing sequence has to be sufficiently large (although we often get away with testing sequences as small as about log n). When data are expensive, this constitutes a waste. Assume that we do not split the data and use the same sequence for testing and training. Often dangerous, this strategy nevertheless works if the class of rules from which we select is sufficiently restricted. The error estimate in this case is appropriately called the resubstitution estimate and it will be denoted by L n (R) . This chapter explores its virtues and pitfalls. A third error estimate, the deleted estimate, is discussed in the next chapter. Estimates based upon other paradigms are treated briefly in Chapter 31.

Full Text