Abstract

The intrinsic dimension is the dimension of the low-dimensional manifold where the high-dimensional data is located. Accurately estimating the intrinsic dimension of the data set is helpful for data-dimensionality reduction and preprocessing. Due to the unknown spatial distribution of data and the limited sample size of a dataset, estimation methods which only use distance information tend to underestimate the intrinsic dimension of dataset. To reduce the estimation complexity and improve the accuracy, two estimation algorithms based on ID(κ) are proposed, where κ is the scaling ratio of the neighborhood radius of the sample point. First, according to the selection criteria of parameter κ, an improved algorithm for selecting the optimal scaling ratio κ is proposed, which reduces the computational complexity and improves the stability of estimation. Second, using simulation datasets with the same sample size and known intrinsic dimensions, the relationship between the estimated dimension and the true intrinsic dimension is obtained, and an underestimation modification method for intrinsic dimension estimation is proposed. Results of comparative experiments on simulation and real datasets indicate that the underestimation modification algorithm has high estimation accuracy and robustness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call