Abstract
In practical problems, high-dimensional data usually has a low-dimensional structure, or the data is located on a low-dimensional manifold. The dimension of this manifold is called the intrinsic dimension of the data. There are many intrinsic dimension estimation methods, among which methods based on the correlation dimension have received extensive attention. However, correlation dimension-based estimation methods often provide a dimension lower than the true intrinsic dimension of the dataset. To explore the reasons behind underestimation, the probabilities of underestimation, overestimation and proper estimation are analyzed using order statistics. The analysis results show that the probability of underestimation is much higher than that of the other two cases, and is verified by simulation experiments. Based on the above analysis, a new method for the estimation of the intrinsic dimension is proposed based on the correlation dimension and k-nearest neighbor method (kNN), which effectively reduces the underestimation. This method is implemented using two algorithms, namely a search algorithm and a matching algorithm. Comprehensive experimental studies on simulation datasets and real datasets show that the proposed algorithms are more effective than the comparison methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have