A $k$-points-based distance for robust geometric inference

Claire Brécheteau,Clément Levrard

doi:10.3150/20-bej1214

Abstract

Analyzing the sub-level sets of the distance to a compact submanifold of $\mathbb{R}^{d}$ is a common method in topological data analysis, to understand its topology. Therefore, topological inference procedures usually rely on a distance estimate based on $n$ sample points (Discrete Comput. Geom. 33 (2005) 249–274). In the case where sample points are corrupted by noise, the distance-to-measure function (DTM, Found. Comput. Math. 11 (2011) 733–751) is a surrogate for the distance-to-compact-set function. In practice, approximating the homology of its sub-level sets requires to compute the homology of unions of $n$ balls (Discrete Comput. Geom. 49 (2013) 22–45; In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms (2015) 168–180 SIAM), that might become intractable whenever $n$ is large. To simultaneously face the two problems of a large number of points and noise, we introduce the $k$-power-distance-to-measure function ($k$-PDTM). This new surrogate for the distance-to-compact is a $k$-points-based approximation of the DTM. These $k$ points are minimizers of a robustified version of the classical $k$-means criterion (In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) (1967) 281–297 Univ. California Press). The sublevel sets of the $k$-PDTM consist in unions of $k$ balls, and this distance is also proved robust to noise. We assess the quality of this approximation for $k$ possibly drastically smaller than $n$, and provide an algorithm to compute this $k$-PDTM from a sample. Numerical experiments illustrate the good behavior of this $k$-points approximation in a noisy topological inference framework.

Full Text