Abstract

This paper develops readily applicable methods for estimating the intrinsic dimension of multivariate datasets. The proposed methods, which make use of theoretical properties of the empirical distribution functions of (pairwise or pointwise) distances, build on the existing concepts of (i) correlation dimensions and (ii) charting manifolds that are contrasted with (iii) a maximum likelihood technique and (iv) other recently proposed geometric methods including MiND and IDEA. This comparison relies on application studies involving simulated examples, a recorded dataset from a glucose processing facility, as well as several benchmark datasets available from the literature. The performance of the proposed techniques is generally in line with other dimension estimators, specifically noting that the correlation dimension variants perform favorably to the maximum likelihood method in terms of accuracy and computational efficiency.

Highlights

  • Nonparametric concepts to extract m feature components embedded within a set of M recorded variables have gained interest in the scientic community.[23]

  • While parametric intrinsic dimension (ID) estimation methods have been intensively studied in the literature, only relatively recent work addressed the utilization of nonparametric methods

  • Correlation dimension and charting manifold approaches have been proposed as concepts rather than tailored methods that can be readily applied in practice

Read more

Summary

Introduction

Nonparametric concepts to extract m feature components embedded within a set of M recorded variables have gained interest in the scientic community.[23] In a nonparametric context, estimating the intrinsic dimension (ID), which can be integer- or real-valued, is challenging. For more traditional parametric models, an often observed situation is that a particular variable may contain information that is encapsulated in other variables too. The variables are interrelated which allows describing them by a reduced set of m 2 N latent variables, with m being the ID. Related (unsupervised) models, discriminate between signicant and residual information and are, conceptually, of one of the following forms[27,57]:

Objectives
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.