Abstract

We investigate the estimation of a density f from a n-sample on an Euclidean space RD, when the data are supported by an unknown submanifold M of possibly unknown dimension d<D, under a reach condition. We investigate several nonparametric kernel methods, with data-driven bandwidths that incorporate some learning of the geometry via a local dimension estimator. When f has Holder smoothness β and M has regularity α, our estimator achieves the rate n−α∧β∕(2α∧β+d) for a pointwise loss. The rate does not depend on the ambient dimension D and we establish that our procedure is asymptotically minimax for α≥β. Following Lepski’s principle, a bandwidth selection rule is shown to achieve smoothness adaptation. We also investigate the case α≤β: by estimating in some sense the underlying geometry of M, we establish in dimension d=1 that the minimax rate is n−β∕(2β+1) proving in particular that it does not depend on the regularity of M. Finally, a numerical implementation is conducted on some case studies in order to confirm the practical feasibility of our estimators.

Highlights

  • In order to recover the density f at a given point x0 ∈ RD of the ambient space, one has to understand the minimal geometry of M that must be learned from the data and how this geometry affects the optimal reconstruction of f

  • The data naturally lie on a submanifold, like a spheroid for geological application, or a cell membrane in microbiology

  • X0 can be seen as an observation X like above, but there is the situation where the statistician can know whether or not a given point x0 is within the support without knowing the geometric features of the latter and without needing to estimate them

Read more

Summary

Motivation

Suppose we observe an n-sample (X1, . . . , Xn) of size n distributed on an Euclidean space RD according to some density function f. In order to recover the density f at a given point x0 ∈ RD of the ambient space, one has to understand the minimal geometry of M that must be learned from the data and how this geometry affects the optimal reconstruction of f The data naturally lie on a submanifold, like a spheroid for geological application, or a cell membrane in microbiology (see for instance Klein et al [43] who describe a technique that yields such a point cloud) In this case, x0 can be seen as an observation X like above, but there is the situation where the statistician can know whether or not a given point x0 is within the support (for instance a point on a cell membrane, or a geographical location on the Earth surface) without knowing the geometric features of the latter and without needing to estimate them

Main results
Organisation of the paper
Some material from geometry
Holder spaces on submanifolds of RD
The reach of a subset
A statistical model for sampling on a unknown manifold
Choice of a loss function and the reach assumption
Density estimation at a fixed point
Kernel estimation
Smoothness adaptation
Simultaneous adaptation to smoothness and dimension
Numerical illustration
An example of a density supported by a one-dimensional submanifold
An example of a density supported by a two-dimensional submanifold
Adaptation
Additional results of geometry

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.