We observe n pairs of independent (but not necessarily i.i.d.) random variables X1=(W1,Y1),…,Xn=(Wn,Yn) and tackle the problem of estimating the conditional distributions Qi⋆(wi) of Yi given Wi=wi for all i∈{1,…,n}. Even though these might not be true, we base our estimator on the assumptions that the data are i.i.d. and the conditional distributions of Yi given Wi=wi belong to a one parameter exponential family Q¯ with parameter space given by an interval I. More precisely, we pretend that these conditional distributions take the form Qθ(wi)∈Q¯ for some θ that belongs to a VC-class Θ¯ of functions with values in I. For each i∈{1,…,n}, we estimate Qi⋆(wi) by a distribution of the same form, i.e. Qθ̂(wi)∈Q¯, where θ̂=θ̂(X1,…,Xn) is a well-chosen estimator with values in Θ¯. We establish non-asymptotic exponential inequalities for the upper deviations of a Hellinger-type distance between the true conditional distributions of the data and the estimated one based on the exponential family Q¯ and the class of functions Θ¯ we chose. We show that our estimation strategy is robust to model misspecification, contamination and the presence of outliers. Besides, when the data are truly i.i.d., the exponential family Q¯ is suitably parametrized and the conditional distributions Qi⋆(wi) of the form Qθ⋆(wi)∈Q¯ for some unknown Hölderian function θ⋆ with values in I, we prove that the estimator θ̂ of θ⋆ is minimax (up to a logarithmic factor). Finally, we provide an algorithm for calculating θ̂ when Θ¯ is a VC-class of functions of low or moderate dimension and we carry out a simulation study to compare its performance to that of the MLE and median-based estimators. The proof of our main result relies on an upper bound, with explicit numerical constants, on the expectation of the supremum of an empirical process over a VC-subgraph class. This bound can be of independent interest.
Read full abstract