Abstract

We consider the problem of estimating the support of a measure from a finite, independent, sample. The estimators which are considered are constructed based on the empirical Christoffel function. Such estimators have been proposed for the problem of set estimation with heuristic justifications. We carry out a detailed finite sample analysis, that allows us to select the threshold and degree parameters as a function of the sample size. We provide a convergence rate analysis of the resulting support estimation procedure. Our analysis establishes that we may obtain finite sample bounds which are comparable to existing rates for different set estimation procedures. Our results rely on concentration inequalities for the empirical Christoffel function and on estimates of the supremum of the Christoffel-Darboux kernel on sets with smooth boundaries, that can be considered of independent interest.

Highlights

  • Given a measure ν on Rp and under appropriate assumption, the Christoffel function with degree bound d ∈ N can be defined on Rp as Λν,d : z →min deg P ≤d, P (z)=1P 2dν, where the infimum is over all polynomials of degree at most d

  • The empirical Christoffel function Λμn,d is associated to an input measure μn, which is a scaled counting measure uniformly supported on a cloud of data-points

  • We have provided a detailed quantitative finite sample analysis of support estimation based on the empirical Christoffel function

Read more

Summary

Introduction

Given a measure ν on Rp and under appropriate assumption, the Christoffel function with degree bound d ∈ N can be defined on Rp as. Important references in multivariate settings include [8, 9, 22, 23, 44], which concern specific cases of the input measure μ and set S These works provide valuable information on the asymptotics of the population Christoffel function as d goes to infinity, and motivate the usage of this function in statistical contexts, especially in support recovery. More precise quantifications on the relation between sample size n and the degree bound d are required, but [24] does not provide any explicit way to choose the degree d as a function of n, and does not provide any convergence guaranty for the full plugin approach based on the empirical Christoffel function Λμn,d, when d depends on n These shortcomings constitute one of the main motivations for the present work

Contribution
Comparison with the existing literature on set estimation
Organisation of the paper
General notation
Problem setting The following notation and assumptions will be standing throughout the text
Orthonormal polynomials Since μ satisfies
Moment matrix Now, let {Pj : 1 ≤ j ≤ s(d)} be a basis of
The Christoffel – Darboux kernel
The Christoffel function Now, we will define the (population)
The empirical Christoffel function
Overview
Assumptions on the support S We first introduce the following definitions, notation and assumptions
Assumption on the density w Now, for δ > 0, we set
Thresholding scheme
Result for the Hausdorff distance between two sets and two boundaries
Result for the
A concentration result for the approximation of the Christoffel function by its empirical counterpart
A heuristic to tune d and γ
Quantitative comparison in dimension 2 and 3 Given the dimension p, we consider the polynomial p p
Representation in the plane
Empirical convergence rate estimation
Outlier detection on benchmark dataset
Conclusion
Upper bound on the Christoffel function outside S
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call