Dimension Reduction
The high dimensionality of datapoints often constitutes an obstacle to efficient computations. This chapter investigates three workarounds that replace the datapoints by some substitutes selected in a lower dimensional set. The first workaround is principal component analysis, where the lower dimensional set is a linear space spanned by the top singular vectors of the data matrix. The second workaround is a Johnson–Lindenstrauss projection, where the lower dimensional set is a random linear space. The third workaround is locally linear embedding, where the lower dimensional set is not chosen as a linear space anymore.
- Research Article
1
- 10.1109/access.2018.2883460
- Jan 1, 2018
- IEEE Access
There are a large number of high-dimensional data in the Internet of Brain Things, and the data is reduced from the high-dimensional to low-dimensional to maintain the similarity between the data, thereby effectively ensuring the operating speed of the Internet of Brain Things. In the traditional method, the distributed dimensionality reduction reconstruction algorithm of the High dimensional data has poor dimensionality reduction effect and serious data distortion after reconstruction. A method for dimensionality reconstruction of high-dimensional data in the Internet of Brain Things is proposed. By using the algorithm of linear discriminant analysis, the projection matrix of high dimensional data is constructed to solve it. According to the solution results, using improved implicit variable model to establish a high dimensional large data dimensionality reduction model for the Internet of Brain Things. The fitness value of data after dimensionality reduction is calculated by quantum immune clonal algorithm, and the optimal individual and optimal solution are determined. The data reconfiguration is realized through the optimal solution marshalling. Experimental results show that the proposed algorithm can effectively improve the dimensionality reduction of High dimensional data in the Internet of Brain Things. After reconstruction, the reconstructed data retain accurate data information, the reliability of reconstructed data is high, and the computational complexity is not high, the need for small storage space, and the advantages of strong promotion ability.
- Research Article
1
- 10.15407/pmach2021.02.037
- Jun 30, 2021
- Journal of Mechanical Engineering
The problem of reducing the axial dimensions of steam turbine diaphragms is associated with the problem of steam turbine modernization performed by increasing the number of reactive blading stages and using existing foundations. Evaluation of the suitability of diaphragm design versions with established steam flow characteristics was carried out with constraints on short- and long-term strength conditions, as well as the accumulation of axial deflections due to creep. For computational research, there was introduced a methodology using the finite element method and Yu. M. Rabotnov’s theory of strain aging. The calculation of creep was reduced to solving an elastic-plastic problem with a deformation curve, which was represented by an isochronous creep curve for the time chosen. A software was used providing for the automated construction of the original computer diaphragm model with the help of guide-vane profile drawings and axial cross-sections of the diaphragm rim and body, as well as several geometric parameters. The calculated model of a welded diaphragm reproduces the main essential features of the structure, the material properties of its elements, as well as steam load. The exploratory studies of diaphragms with reduced axial dimensions were performed on the example of the second- and third-stage diaphragms of the high-pressure cylinder of the K-325-23.5 steam turbine. The original second- and third-stage diaphragm designs were considered to be basic, in relation to which, according to strength and rigidity parameters, the alternative ones were compared. Calculated data for the basic diaphragm design versions for 100 thousand operating hours were obtained. According to the calculations, maximum deflections are achieved at diaphragm edges, and the stresses, that are maximum at the points where the guide vanes are attached to the diaphragm rim and body, undergo a significant redistribution due to creep. Two approaches to the reduction of the axial dimensions of the second-stage diaphragm design of the steam turbine high pressure cylinder were involved. In the first approach, the reduction of the dimensions was achieved by proportionally reducing the guide-vane profile with a corresponding increase in the number of the guide vanes. In the second approach, the profile remained unchanged, but the axial dimensions of the diaphragm rim and body were reduced. The parameters of strength both in the elastic state at the beginning of operation and in the conditions of creep, as well as the accumulation of axial deflections were investigated. Based on the comparisons with the basic design, it was established that the second approach is more effective. Additional recommendations for the use of more heat-resistant steels for outlet guide vanes and the conditions of diaphragm attachment in the turbine casing are given.
- Research Article
- 10.1142/s021800141451001x
- Mar 1, 2014
- International Journal of Pattern Recognition and Artificial Intelligence
Recently, many dimensionality reduction (DR) algorithms have been developed, which are successfully applied to feature extraction and representation in pattern classification. However, many applications need to re-project the features to the original space. Unfortunately, most DR algorithms cannot perform reconstruction. Based on the manifold assumption, this paper proposes a General Manifold Reconstruction Framework (GMRF) to perform the reconstruction of the original data from the low dimensional DR results. Comparing with the existing reconstruction algorithms, the framework has two significant advantages. First, the proposed framework is independent of DR algorithm. That is to say, no matter what DR algorithm is used, the framework can recover the structure of the original data from the DR results. Second, the framework is space saving, which means it does not need to store any training sample after training. The storage space GMRF needed for reconstruction is far less than that of the training samples. Experiments on different dataset demonstrate that the framework performs well in the reconstruction.
- Conference Article
2
- 10.1109/icdm.2009.34
- Dec 1, 2009
Dimension Reduction (DR) algorithms are generally categorized into feature extraction and feature selection algorithms. In the past, few works have been done to contrast and unify the two algorithm categories. In this work, we introduce a matrix trace oriented optimization framework to provide a unifying view for both feature extraction and selection algorithms. We show that the unified view of DR algorithms allows us to discover some essential relationships among many state-of-the-art DR algorithms. Inspired by these essential insights, we propose to synthesize unlimited number of novel DR algorithms by combining, mapping and integrating the state- of-the-art algorithms. We present examples of newly synthesized DR algorithms with experimental results to show the effectiveness of our automatically synthesized algorithms. © 2009 IEEE.
- Research Article
1
- 10.2139/ssrn.3854519
- May 27, 2021
- SSRN Electronic Journal
While single-cell “omics” based measurements hold the promise of unparalleled biological insight they remain a challenge to analyze owing to their high-dimensional nature. As a result, Dimensionality Reduction (DR) algorithms are necessary for data visualization and for downstream quantitative analysis. The lack of a principled methodology for separating signal from noise in DR algorithmic outputs has limited the confident application of these methods in unsupervised analyses of single-cell data, greatly hampering researchers’ ability to make data-driven discoveries. In this work we present an approach to quality assessment, EMBEDR, that works in conjunction with any DR algorithm to distinguish signal from noise in dimensionally-reduced representations of high-dimensional data. We apply EMBEDR to t-SNE and UMAP-generated representations of published scRNA-seq data, revealing where lower-dimensional representations of the data are faithful renditions of biological signal in the data, and where they are more consistent with noise. EMBEDR produces easily interpreted ?-values for each cell in a data set, facilitating the comparison of different DR methods and allowing optimization of their global hyperparameters. Most compellingly, EMBEDR allows for the analysis of single-cell data at a single-cell resolution, allowing DR methods to be used in a cell-wise optimal manner. Applying this technique to real data results in a biologically interpretable view of the data with no user supervision. We demonstrate the utility of EMBEDR in the context of several data sets and DR algorithms, illustrating its robustness and flexibility as well as its potential for making rigorous, quantitative analyses of single-cell omics data. EMBEDR is available as a Python package for immediate use.
- Research Article
60
- 10.1038/nnano.2014.103
- Jun 15, 2014
- Nature Nanotechnology
The selectivity and speed of many biological transport processes transpire from a 'reduction of dimensionality' that confines diffusion to one or two dimensions instead of three. This behaviour remains highly sought after on polymeric surfaces as a means to expedite diffusional search processes in molecular engineered systems. Here, we have reconstituted the two-dimensional diffusion of colloidal particles on a molecular brush surface. The surface is composed of phenylalanine-glycine nucleoporins (FG Nups)--intrinsically disordered proteins that facilitate selective transport through nuclear pore complexes in eukaryotic cells. Local and ensemble-level experiments involving optical trapping using a photonic force microscope and particle tracking by video microscopy, respectively, reveal that 1-µm-sized colloidal particles bearing nuclear transport receptors called karyopherins can exhibit behaviour that varies from highly localized to unhindered two-dimensional diffusion. Particle diffusivity is controlled by varying the amount of free karyopherins in solution, which modulates the multivalency of Kap-binding sites within the molecular brush. We conclude that the FG Nups resemble stimuli-responsive molecular 'velcro', which can impart 'reduction of dimensionality' as a means of biomimetic transport control in artificial environments.
- Research Article
- 10.23977/geors.2018.11012
- Jan 1, 2018
- Geoscience and Remote Sensing
Hyperspectral image (HSI) classification requires spectral dimensionalityreduction and spatial filtering. While common dimensionality reduction and denoising methods use linear algebra, we propose a tensorial method to jointly achieve denoising and dimensionality reduction.<br /> Firstly, we propose a new method for pre-whitening the noise (PW) in HSI. Then we propose a method based on quadtree decomposition adapted to tensor data in order to take into account the local image characteristics in the multi-way Wiener filter (LMWF) which performs both noise and spectral dimensionality reduction, referred to as PW-LMWFdr-(K1;K2;P3). Classification algorithm SVM is applied to the output of dimensionality and noise reduction methods to compare their efficiency: The proposed PW-LMWFdr-(K1;K2;P3), PW-MWFdr-(K1;K2;P3), PCAdr,MNFdr associated with Wiener filtering.
- Conference Article
6
- 10.1109/bicta.2010.5645247
- Sep 1, 2010
Microarray analysis and visualization is very helpful for biologists and clinicians to understand gene expression in cells and to facilitate diagnosis and treatment of patients. However, a typical microarray dataset has thousands of features and a very small number of observations. This very high dimensional data has a massive amount of information which often contains some noise, non-useful information and small number of relevant features for disease or genotype. This paper proposes a framework for very high dimensional data reduction based on three technologies: feature selection, linear dimensionality reduction and non-linear dimensionality reduction. In this paper, feature selection based on mutual information will be proposed for filtering features and selecting the most relevant features with the minimum redundancy. A kernel linear dimensionality reduction method is also used to extract the latent variables from a high dimensional data set. In addition, a non-linear dimensionality reduction based on local linear embedding is used to reduce the dimension and visualize the data. Experimental results are presented to show the outputs of each step and the efficiency of this framework.
- Research Article
35
- 10.1016/s0550-3213(02)00965-3
- Nov 7, 2002
- Nuclear Physics, Section B
The anomaly in the central charge of the supersymmetric kink from dimensional regularization and reduction
- Conference Article
7
- 10.1109/icde.2012.115
- Apr 1, 2012
Dimensionality reduction is essential in text mining since the dimensionality of text documents could easily reach several tens of thousands. Most recent efforts on dimensionality reduction, however, are not adequate to large document databases due to lack of scalability. We hence propose a new type of simple but effective dimensionality reduction, called horizontal (dimensionality) reduction, for large document databases. Horizontal reduction converts each text document to a few bitmap vectors and provides tight lower bounds of inter-document distances using those bitmap vectors. Bitmap representation is very simple and extremely fast, and its instance-based nature makes it suitable for large and dynamic document databases. Using the proposed horizontal reduction, we develop an efficient k-nearest neighbor (k-NN) search algorithm for text mining such as classification and clustering, and we formally prove its correctness. The proposed algorithm decreases I/O and CPU overheads simultaneously since horizontal reduction (1) reduces the number of accesses to documents significantly by exploiting the bitmap-based lower bounds in filtering dissimilar documents at an early stage, and accordingly, (2) decreases the number of CPU-intensive computations for obtaining a real distance between high-dimensional document vectors. Extensive experimental results show that horizontal reduction improves the performance of the reduction (preprocessing) process by one to two orders of magnitude compared with existing reduction techniques, and our k-NN search algorithm significantly outperforms the existing ones by one to three orders of magnitude.
- Research Article
22
- 10.1080/1350486x.2015.1110492
- Nov 2, 2015
- Applied Mathematical Finance
One-way coupling often occurs in multi-dimensional models in finance. In this paper, we present a dimension reduction technique for Monte Carlo (MC) methods, referred to as drMC, that exploits this structure for pricing plain-vanilla European options under an N-dimensional one-way coupled model, where N is arbitrary. The dimension reduction also often produces a significant variance reduction.The drMC method is a dimension reduction technique built upon (i) the conditional MC technique applied to one of the factors which does not depend on any other factors in the model, and (ii) the derivation of a closed-form solution to the conditional partial differential equation (PDE) that arises via Fourier transforms. In the drMC approach, the option price can be computed simply by taking the expectation of this closed-form solution. Hence, the approach results in a powerful dimension reduction from N to one, which often results in a significant variance reduction as well, since the variance associated with the other factors in the original model are completely removed from the drMC simulation. Moreover, under the drMC framework, hedging parameters, or Greeks, can be computed in a much more efficient way than in traditional MC techniques. A variance reduction analysis of the method is presented and numerical results illustrating the method’s efficiency are provided.
- Conference Article
1
- 10.1109/iv.2019.00046
- Jul 1, 2019
Dimensionality Reduction (DR) techniques are widely used to analyze and make sense of high-dimensional data. Each method is geared towards preserving a different aspect of the data. For example, some techniques favor neighborhood preservation whereas others favor distance preservation. While these DR techniques help users to represent their data, it makes a complex task to select a suitable DR. Also, most DR techniques have additional parameters that affect the results, which make the task of choosing a technique more difficult. Existing methods compare DR techniques using some quality metrics, and some of them combine DR outputs by averaging projections. However, it does not yet provide enough mechanisms to create a new DR according to user requirements. In this paper, we present a way to analyze and compare different DR techniques. It is an interactive assessment method that allows a user to explore known DR techniques, identify the differences between them, and create a new DR technique that combines existing techniques to match user expectations.
- Research Article
17
- 10.5589/m08-007
- Jan 1, 2008
- Canadian Journal of Remote Sensing
For dimensionality reduction (DR) of a hyperspectral data cube or band selection, it is desirable to have one method that is suitable for all remote sensing applications. However, in reality this is not possible. A specific remote sensing application requires a specific DR or band selection method that best suits it. In this paper, the evaluation and comparison of three DR methods‐namely, principal component analysis (PCA), wavelet, and minimum noise fraction (MNF)‐and one band selection method were conducted. Based on the experiments, the following was observed. For endmember extraction, the PCA DR, wavelet DR, and band selection found all five endmembers. However, the MNF DR missed one endmember. For mineral detection, the MNF DR produced a map that is closest to the true map when compared with the other DR methods and band selection method. For classification, the PCA DR produced the highest classification rates whereas the other methods yielded less classification rates.
- Research Article
16
- 10.1007/s10618-019-00616-4
- Feb 20, 2019
- Data Mining and Knowledge Discovery
Unsupervised matrix-factorization-based dimensionality reduction (DR) techniques are popularly used for feature engineering with the goal of improving the generalization performance of predictive models, especially with massive, sparse feature sets. Often DR is employed for the same purpose as supervised regularization and other forms of complexity control: exploiting a bias/variance tradeoff to mitigate overfitting. Contradicting this practice, there is consensus among existing expert guidelines that supervised regularization is a superior way to improve predictive performance. However, these guidelines are not always followed for this sort of data, and it is not unusual to find DR used with no comparison to modeling with the full feature set. Further, the existing literature does not take into account that DR and supervised regularization are often used in conjunction. We experimentally compare binary classification performance using DR features versus the original features under numerous conditions: using a total of 97 binary classification tasks, 6 classifiers, 3 DR techniques, and 4 evaluation metrics. Crucially, we also experiment using varied methodologies to tune and evaluate various key hyperparameters. We find a very clear, but nuanced result. Using state-of-the-art hyperparameter-selection methods, applying DR does not add value beyond supervised regularization, and can often diminish performance. However, if regularization is not done well (e.g., one just uses the default regularization parameter), DR does have relatively better performance—but these approaches result in lower performance overall. These latter results provide an explanation for why practitioners may be continuing to use DR without undertaking the necessary comparison to using the original features. However, this practice seems generally wrongheaded in light of the main results, if the goal is to maximize generalization performance.
- Conference Article
2
- 10.1109/isise.2008.18
- Dec 1, 2008
Firstly, the author generalizes the indiscernible relation to similarity relation, gives a definition of lambda-discernibility matrix; secondly, an attribute reduct algorithm of decision table with continuous condition attributes is put forward; thirdly, an algorithm of computing significance of each condition attribute is put forward according to the properties of lambda-discernibility matrix; at last, the time complexity of the attribute reduct algorithm is analyzed and the rationality and effectiveness of this algorithm is accounted for through an example.