On Strong Consistency of Model Selection in Classification

J Suzuki

doi:10.1109/tit.2006.883611

Abstract

This paper considers model selection in classification. In many applications such as pattern recognition, probabilistic inference using a Bayesian network, prediction of the next in a sequence based on a Markov chain, the conditional probability P(Y=y|X=x) of class yisinY given attribute value xisinX is utilized. By model we mean the equivalence relation in X: for x,x'isinXx~x'hArrP(Y=y|X=x)=P(Y=y|X=x'), forall yisinY. By classification we mean the number of such equivalence classes is finite. We estimate the model from n samples z n =(x i ,y i ) i=1 n isin(XtimesY) n , using information criteria in the form empirical entropy H plus penalty term (k/2)d n (the model such that H+(k/2)d n is minimized is the estimated model), where k is the number of independent parameters in the model, and {d n } n=1 infin is a real nonnegative sequence such that lim sup n d n /n=0. For autoregressive processes, although the definitions of H and k are different, it is known that the estimated model almost surely coincides with the true model as nrarrinfin if {d n } n=1 infin >{2loglogn} n=1 infin , and that it does not if {d n } n=1 infin <{2loglogn} n=1 infin (Hannan and Quinn). The problem whether the same property is true for classification was open. This paper solves the problem in the affirmative

Full Text