Intrinsic dimension estimation for locally undersampled data.

Vittorio Erba,Marco Gherardi,Pietro Rotondo

doi:10.1038/s41598-019-53549-9

Vittorio Erba, Marco Gherardi + Show 1 more

Open Access

PDF Available

https://doi.org/10.1038/s41598-019-53549-9

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Identifying the minimal number of parameters needed to describe a dataset is a challenging problem known in the literature as intrinsic dimension estimation. All the existing intrinsic dimension estimators are not reliable whenever the dataset is locally undersampled, and this is at the core of the so called curse of dimensionality. Here we introduce a new intrinsic dimension estimator that leverages on simple properties of the tangent space of a manifold and extends the usual correlation integral estimator to alleviate the extreme undersampling problem. Based on this insight, we explore a multiscale generalization of the algorithm that is capable of (i) identifying multiple dimensionalities in a dataset, and (ii) providing accurate estimates of the intrinsic dimension of extremely curved manifolds. We test the method on manifolds generated from global transformations of high-contrast images, relevant for invariant object recognition and considered a challenge for state-of-the-art intrinsic dimension estimators.

Highlights

Identifying the minimal number of parameters needed to describe a dataset is a challenging problem known in the literature as intrinsic dimension estimation
Manifold learning and dimensional reduction[1,2,3] are the main techniques employed to perform this task. Several of these approaches work under the reasonable assumption that the points of a dataset, represented as vectors of real numbers lying in a space of large embedding dimension D, belong to a manifold, whose intrinsic dimension (ID) d is much lower than D
Principal component analysis (PCA) is the main representative of this class of algorithms. Both a global and a multiscale version of the algorithm are used[15,24]. In the former one evaluates the covariance matrix on the whole dataset X, whereas in the latter one performs the spectral analysis on local subsets X(x0,rc) of X, obtained by selecting one particular point x0 and including in the local covariance matrix only those points that lie inside a cutoff radius rc, which is varied

Summary

Introduction

Identifying the minimal number of parameters needed to describe a dataset is a challenging problem known in the literature as intrinsic dimension estimation. This observation translates into a simple exact algorithm to determine the ID d of linearly embedded spherical datasets, by performing a non-linear regression of the empirical density of neighbours using the FCI in (Eq 2).

Results

Conclusion