ABSTRACT U-statistics represent a fundamental class of statistics that emerge from modelling quantities of interest defined by multi-subject responses. These statistics generalize the empirical mean of a random variable X to summations encompassing all distinct k-tuples of observations drawn from X . A significant advancement was made by Stute [ConditionalU-statistics. Ann. Probab. 19 (1991), pp. 812–825], who introduced conditional U-statistics as a generalization of the Nadaraya–Watson estimates for regression functions. Stute demonstrated their robust pointwise consistency towards the conditional function r ( k ) ( φ , t ~ ) = E ( φ ( Y 1 , … , Y k ) ∣ ( X 1 , … , X k ) = t ~ ) , for t ~ ∈ R pk , where φ is a measurable function. In this investigation, we develop oracle inequalities and upper bounds for kernel-based estimators of conditional U-statistics of general order, applicable across a wide range of metric spaces associated with operators. Our analysis specifically targets doubling measure metric spaces, incorporating a non-negative self-adjoint operator characterized by Gaussian regularity in its heat kernel. Remarkably, our study achieves an optimal convergence rate in certain cases. To derive these results, we explore the regression function within a general framework, introducing several novel insights. These findings are established under sufficiently broad conditions on the underlying distributions. The theoretical results serve as essential tools for advancing the field of general-valued data, with potential applications including the examination of conditional distribution functions, relative-error prediction, the Kendall rank correlation coefficient, and discrimination problems – areas of significant independent interest.
Read full abstract