Curse of Dimensionality

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

This chapter introduces different notions of tractability and intractability. Most notably, the curse of dimensionality occurs when the solution of a multivariate problem would necessitate a number of observations growing exponentially with the number of variables involved. As illustrations, relying on reproducing-kernel techniques, it is shown on the one hand that the integration of trigonometric polynomials is an intractable problem, and on the other hand that integration in weighted Sobolev spaces is a tractable problem provided the weights decay fast enough.

Similar Papers
  • Conference Article
  • Cite Count Icon 10
  • 10.1145/3230905.3230909
Addressing the Links Between Dimensionality and Data Characteristics in Gene-Expression Microarrays
  • May 2, 2018
  • J Salvador Sánchez + 1 more

In gene-expression microarray data sets each sample is defined by hundreds or thousands of measurements. High-dimensionality data spaces have been reported as a significant obstacle to apply machine learning algorithms, owing to the associated phenomenon called 'curse of dimensionality'. Therefore the analysis (and interpretation) of these data sets has become a challenging problem. The hypothesis set out in this paper is that the curse of dimensionality is directly linked to other intrinsic data characteristics, such as class overlapping and class separability. To examine our hypothesis, here we have carried out a series of experiments over four gene-expression microarray databases because these data correspond to a typical example of the so-called 'curse of dimensionality' phenomenon. The results show that there exist meaningful relationships between dimensionality and some specific complexities that are inherent to data (especially, class separability and geometry of manifolds). Moreover, it is also discussed the behavior of three classifiers as a function of dimensionality and data complexities.

  • Research Article
  • Cite Count Icon 106
  • 10.1214/09-ss049
Curse of dimensionality and related issues in nonparametric functional regression
  • Jan 1, 2011
  • Statistics Surveys
  • Gery Geenens

Recently, some nonparametric regression ideas have been extended to the case of functional regression. Within that framework, the main concern arises from the infinite dimensional nature of the explanatory objects. Specifically, in the classical multivariate regression context, it is well-known that any nonparametric method is affected by the so-called “curse of dimensionality”, caused by the sparsity of data in high-dimensional spaces, resulting in a decrease in fastest achievable rates of convergence of regression function estimators toward their target curve as the dimension of the regressor vector increases. Therefore, it is not surprising to find dramatically bad theoretical properties for the nonparametric functional regression estimators, leading many authors to condemn the methodology. Nevertheless, a closer look at the meaning of the functional data under study and on the conclusions that the statistician would like to draw from it allows to consider the problem from another point-of-view, and to justify the use of slightly modified estimators. In most cases, it can be entirely legitimate to measure the proximity between two elements of the infinite dimensional functional space via a semi-metric, which could prevent those estimators suffering from what we will call the “curse of infinite dimensionality”.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/fskd.2007.449
PCR-Tree: A Compression-Based Index Structure for Similarity Searching in High-Dimensional Image Databases
  • Jan 1, 2007
  • Jiangtao Cui + 2 more

The vector approximation file (VA-file) approach is an efficient high-dimensional indexing method using compression technique to overcome the difficulty of 'curse of dimensionality'. The VA-file method combined with tree-based index structure can improve the querying efficiency, but it still succumbs to the 'curse of dimensionality'. In this paper, a new high-dimensional indexing structure called PCR-tree for non-uniform distributed data sets was presented, which employs R-tree to manage the approximate vectors in the reduced-dimensionality space. The approximate vectors can be built in the KL transform domain, and low dimensional MBRs (minimum bounding rectangles) can be used to manage the approximations on the first few principal components. When performing k-nearest neighbor search, a lower-bound filtering algorithm is used to reject the improper nodes of PCR-tree, which can reduce the computational complexity and I/O cost without any dismissals. The experiment results on large image databases show that the new approach provides a faster search speed than other tree-structured vector approximation approaches.

  • Research Article
  • Cite Count Icon 14
  • 10.1016/j.patcog.2020.107685
Robust sparse coding for one-class classification based on correntropy and logarithmic penalty function
  • Sep 28, 2020
  • Pattern Recognition
  • Hong-Jie Xing + 2 more

Robust sparse coding for one-class classification based on correntropy and logarithmic penalty function

  • Research Article
  • Cite Count Icon 15
  • 10.1007/s42985-021-00100-z
Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms
  • Jun 8, 2022
  • Partial Differential Equations and Applications
  • Philipp Grohs + 2 more

In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including, e.g., language processing, image recognition, fraud detection, and computational advertisement. Recently, it has also been proposed in the scientific literature to reformulate high-dimensional partial differential equations (PDEs) as stochastic learning problems and to employ DNNs together with stochastic gradient descent methods to approximate the solutions of such high-dimensional PDEs. There are also a few mathematical convergence results in the scientific literature which show that DNNs can approximate solutions of certain PDEs without the curse of dimensionality in the sense that the number of real parameters employed to describe the DNN grows at most polynomially both in the PDE dimension d in {mathbb {N}} and the reciprocal of the prescribed approximation accuracy varepsilon > 0. One key argument in most of these results is, first, to employ a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove then that DNNs are flexible enough to mimic the behaviour of the employed approximation scheme. Having this in mind, one could aim for a general abstract result which shows under suitable assumptions that if a certain function can be approximated by any kind of (Monte Carlo) approximation scheme without the curse of dimensionality, then the function can also be approximated with DNNs without the curse of dimensionality. It is a subject of this article to make a first step towards this direction. In particular, the main result of this paper, roughly speaking, shows that if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality. Moreover, for the number of real parameters used to describe such approximating DNNs we provide an explicit upper bound for the optimal exponent of the dimension d in {mathbb {N}} of the function under consideration as well as an explicit lower bound for the optimal exponent of the prescribed approximation accuracy varepsilon >0. As an application of this result we derive that solutions of suitable Kolmogorov PDEs can be approximated with DNNs without the curse of dimensionality.

  • Research Article
  • Cite Count Icon 12
  • 10.1007/s10844-008-0056-5
A local semi-supervised Sammon algorithm for textual data visualization
  • May 26, 2008
  • Journal of Intelligent Information Systems
  • Manuel Martín-Merino + 1 more

Sammon's mapping is a powerful non-linear technique that allow us to visualize high dimensional object relationships. It has been applied to a broad range of practical problems and particularly to the visualization of the semantic relations among terms in textual databases. The word maps generated by the Sammon mapping suffer from a low discriminant power due to the well known "curse of dimensionality" and to the unsupervised nature of the algorithm. Fortunately the textual databases provide frequently a manually created classification for a subset of documents that may help to overcome this problem. In this paper we first introduce a modification of the Sammon mapping (SSammon) that enhances the local topology reducing the sensibility to the 'curse of dimensionality'. Next a semi-supervised version is proposed that takes advantage of the a priori categorization of a subset of documents to improve the discriminant power of the word maps generated. The new algorithm has been applied to the challenging problem of word map generation. The experimental results suggest that the new model improves significantly well known unsupervised alternatives.

  • Research Article
  • 10.15866/irecos.v8i9.3467
Fusion of Global Shape and Local Features Using Meta-Classifier Framework
  • Sep 30, 2013
  • International Review on Computers and Software
  • Noridayu Manshor + 2 more

In computer vision, objects in an image can be described using many features such as shape, color, texture and local features. The number of dimensions for each type of feature has differing size. Basically, the underlying belief from a recognition point of view is that, the more features being used, the better the recognition performance. However, having more features does not necessarily correlate to better performance. The higher dimensional vectors resulting from fusion might contain irrelevant or noisy features that can degrade classifier performance. Repetitive and potentially useless information might be present which further escalates the 'curse of dimensionality' problem. Consequently, unwanted and irrelevant features are removed from the combination of features. Although this technique provides promising recognition performance, it is not efficient when it comes to computational time in model building. This study proposes meta- classifier framework to ensure all relevant features are not ignored, while maintaining minimal computational time. In this framework, individual classifiers are trained using the local and global shape features, respectively. Then, these classifiers results are combined as input to the meta- classifier. Experimental results have shown to be comparable, or superior to existing state-of-the-art works for object class recognition.

  • Research Article
  • Cite Count Icon 261
  • 10.1002/(sici)1097-0258(19970215)16:3<285::aid-sim535>3.0.co;2-
TOWARD A CURSE OF DIMENSIONALITY APPROPRIATE (CODA) ASYMPTOTIC THEORY FOR SEMI-PARAMETRIC MODELS
  • Feb 15, 1997
  • Statistics in Medicine
  • James M Robins + 1 more

TOWARD A CURSE OF DIMENSIONALITY APPROPRIATE (CODA) ASYMPTOTIC THEORY FOR SEMI-PARAMETRIC MODELS

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 45
  • 10.3390/e22101105
Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality.
  • Sep 30, 2020
  • Entropy
  • Evgeny M Mirkes + 2 more

The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using the Manhattan distance and even fractional quasinorms (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. It is illustrated that fractional quasinorms have a greater relative contrast and coefficient of variation than the Euclidean norm , but it is shown that this difference decays with increasing space dimension. It has been demonstrated that the concentration of distances shows qualitatively the same behaviour for all tested norms and quasinorms. It is shown that a greater relative contrast does not mean a better classification quality. It was revealed that for different databases the best (worst) performance was achieved under different norms (quasinorms). A systematic comparison shows that the difference in the performance of kNN classifiers for at p = 0.5, 1, and 2 is statistically insignificant. Analysis of curse and blessing of dimensionality requires careful definition of data dimensionality that rarely coincides with the number of attributes. We systematically examined several intrinsic dimensions of the data.

  • Research Article
  • Cite Count Icon 393
  • 10.1002/(sici)1097-0258(19970215)16:3<285::aid-sim535>3.0.co;2-#
Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models.
  • Feb 15, 1997
  • Statistics in Medicine
  • James M Robins + 1 more

Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models.

  • Research Article
  • Cite Count Icon 3
  • 10.2495/eco030452
Predicting Terminal Time And Final Crop Number For A Forest Plantation Stand:Pontryagin's Maximum Principle
  • May 28, 2003
  • WIT Transactions on Ecology and the Environment
  • Chikumbo + 1 more

A lot of work has gone into developing management strategies for forest plantation stands. Analysts have resorted to the use of dynamic programming to find an optimum management strategy for a stand. The sterile 'curse of dimensionality' in dynamic programming computations has lead to the pursuit of alternative heuristic search algorithms that are plagued with the inherent inability to verify optimality. Optimality in stand management has always been a lingering issue in forest literature, since stand optimisation formulations started appearing in forest science journals from the early 1960s. Pontryagin's Maximum Principle was long cited as a potential exact solution technique, but there was never a demonstration of this technique with stand measurement data. However, using dynamical models as building blocks and based on stand measurement data, the authors have demonstrated the applicability of Pontryagin's Maximum Principle, avoiding the curse of dimensionality. All formulations demonstrated so far have not addressed the terminal time and constraint problem. What is presented in this paper is a combined optimal control and parameter selection formulation, using Pontryagin's Maximum Principle. The parameters are the initial planting density and final crop number where the optimal control is the harvesting strategy, all estimated for a specific rotation length.

  • Research Article
  • Cite Count Icon 7
  • 10.5075/epfl-thesis-4292
Multimodal feature extraction and fusion for audio-visual speech recognition
  • Jan 1, 2009
  • Infoscience (Ecole Polytechnique Fédérale de Lausanne)
  • Mihai Gurban

Multimodal feature extraction and fusion for audio-visual speech recognition

  • Research Article
  • Cite Count Icon 277
  • 10.1016/j.eswa.2007.01.002
A hybrid cooperative–comprehensive learning based PSO algorithm for image segmentation using multilevel thresholding
  • Jan 27, 2007
  • Expert Systems with Applications
  • M Maitra + 1 more

A hybrid cooperative–comprehensive learning based PSO algorithm for image segmentation using multilevel thresholding

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/fbie.2008.101
Ovarian Cancer Mass Spectrometry Data Analysis Based on ICA Algorithm
  • Dec 1, 2008
  • Zhaoxin Wang + 2 more

Independent component analysis (ICA) can find hidden information on the mass spectrometry (MS) data. However, ICA does not take advantage of prior information in the construction of sub-space, as no consideration is taken about the class information. In this research a supervised version of ICA (SICA) is introduced. Due to the large amount of information contained within MS data, the 'curse of dimensionality' must be solved before ICA and SICA are employed. This paper examines the performance of ICA and SICA using the following feature extraction and feature selection algorithms on ovarian cancer MS data, namely principal component analysis (PCA), 2nd-PCA, and T-test. Experimental results show that the performance of ICA and SICA can achieve good classification results on ovarian cancer MS dataset pre-processed by T-test.

  • Conference Article
  • Cite Count Icon 15
  • 10.1145/3365109.3368792
Feature Selection Methods for Linked Data: Limitations, Capabilities and Potentials
  • Dec 2, 2019
  • Marianne Cherrington + 5 more

Feature selection is an important pre-processing, data mining, and knowledge discovery tool for data analysis. By eliminating redundant and irrelevant features from high-dimensional data, feature selection diminishes the 'curse of dimensionality' to improve performance. Data are becoming increasingly complex; heterogeneous data may often be viewed as natural collections of linked objects. Linked data are structured data that are connected with other data sources through the use of semantic queries. It is increasingly prevalent in social media websites and biological networks. Many feature selection methods assume independent and identically distributed data (IID), a condition violated with linked data. In this paper, a review of current feature selection techniques for linked data is presented. Several approaches are examined in various contexts so that performance issues and ongoing challenges can be assessed. The major contribution of this paper is to underscore contemporary uses and limitations of linked data feature selection techniques with the purpose of informing existing capabilities and current potentials for key areas of future research and application.

Save Icon
Up Arrow
Open/Close