Abstract
We investigate the estimation of a density f from a n-sample on an Euclidean space RD, when the data are supported by an unknown submanifold M of possibly unknown dimension d<D, under a reach condition. We investigate several nonparametric kernel methods, with data-driven bandwidths that incorporate some learning of the geometry via a local dimension estimator. When f has Holder smoothness β and M has regularity α, our estimator achieves the rate n−α∧β∕(2α∧β+d) for a pointwise loss. The rate does not depend on the ambient dimension D and we establish that our procedure is asymptotically minimax for α≥β. Following Lepski’s principle, a bandwidth selection rule is shown to achieve smoothness adaptation. We also investigate the case α≤β: by estimating in some sense the underlying geometry of M, we establish in dimension d=1 that the minimax rate is n−β∕(2β+1) proving in particular that it does not depend on the regularity of M. Finally, a numerical implementation is conducted on some case studies in order to confirm the practical feasibility of our estimators.
Highlights
In order to recover the density f at a given point x0 ∈ RD of the ambient space, one has to understand the minimal geometry of M that must be learned from the data and how this geometry affects the optimal reconstruction of f
The data naturally lie on a submanifold, like a spheroid for geological application, or a cell membrane in microbiology
X0 can be seen as an observation X like above, but there is the situation where the statistician can know whether or not a given point x0 is within the support without knowing the geometric features of the latter and without needing to estimate them
Summary
Suppose we observe an n-sample (X1, . . . , Xn) of size n distributed on an Euclidean space RD according to some density function f. In order to recover the density f at a given point x0 ∈ RD of the ambient space, one has to understand the minimal geometry of M that must be learned from the data and how this geometry affects the optimal reconstruction of f The data naturally lie on a submanifold, like a spheroid for geological application, or a cell membrane in microbiology (see for instance Klein et al [43] who describe a technique that yields such a point cloud) In this case, x0 can be seen as an observation X like above, but there is the situation where the statistician can know whether or not a given point x0 is within the support (for instance a point on a cell membrane, or a geographical location on the Earth surface) without knowing the geometric features of the latter and without needing to estimate them
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.