A New Approach to Improve the Topological Stability in Non-Linear Dimensionality Reduction

Mohammed Elhenawy,Mahmoud Masoud,Sebastien Glaser,Andry Rakotonirainy

doi:10.1109/access.2020.2973921

Mohammed Elhenawy, Mahmoud Masoud + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.2973921

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 16	License type: CC BY 4.0

Affiliation: Queensland University of Technology

Abstract

Dimensionality reduction in the machine learning field mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new non-linear techniques for dimensionality reduction based on the assumption that data laying on or near a complex low-dimensional manifold is embedded in the high-dimensional space. On the other side, new techniques for dimensionality reduction aim to identify and extracting the manifold from the high-dimensional space. Isomap is one of widely-used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). The Isomap chooses the k-nearest neighbors based on the distance only which causes bridges and topological instability. In this paper, we propose a new algorithm to find the nearest neighbors to optimize the number of short-circuit errors and thus improve the topological stability. We assume that any point on the manifold and its nearest neighbors form a vector subspace and the orthogonal to that subspace is orthogonal to all vectors spans the vector subspace. Therefore, in the proposed algorithm, the point on the manifold and its nearest neighbors (i.e. the selected number of nearest neighbors based on the required subspace’s dimension) are used to find the bases of the subspace and the orthogonal to that subspace which belongs to the orthogonal complementary subspace. Then, candidate neighbors points will be added to the nearest neighbors based on the distance and the angle between each candidate point and the orthogonal to the subspace. The new algorithm is tested using low dimensional (3D) synthesized datasets and high dimensional real datasets. The superior performance of the new algorithm in choosing the nearest neighbors is confirmed through visually inspecting its capability find to the correct $2D$ representation of the synthesized datasets at different $k$ . Moreover, In the case of the high dimensional real datasets, the new algorithm yields a lower residual variance than the standard Isomap. Finally, we investigated using the new algorithm to find the nearest neighbors for the Locally Linear Embedding (LLE) algorithm. We tested the LLE variant using synthesized datasets and high dimensional real datasets and the results were promising

Highlights

Dimensionality reduction is the process of transforming a high-dimensional data-set into a meaningfully reduced dimensionality
The ideal goal of the dimension reduction is discovering the minimum number of parameters needed to account for the observed properties of the data, which is called the intrinsic dimensionality of data [1]
The results show good low dimensional representation at k = 3 and 10 and a perfect representation at k = 5 which is to our mind better than the t-Distributed Stochastic Neighbor Embedding (t-SNE) presentation

Summary

INTRODUCTION

Dimensionality reduction is the process of transforming a high-dimensional data-set into a meaningfully reduced dimensionality. Isomap estimates the geodesic distances between all pairs of points on the manifold M. Isomap to estimate the intrinsic geometry of a data manifold depends on choosing the correct neighbors for each point. Defining the connectivity of each data point via its nearest Euclidean neighbors in the high-dimensional is vulnerable to short-circuit errors [8]. These errors could happen if the folds in the manifold on which the data points lie are too close and the distances between these points are smaller than the true neighborhood distance. This angle should be close to 90 degrees to consider the candidate neighbor as neighbor

THEORETICAL BACKGROUND AND METHODS

EXPERIMENTAL WORK

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A New Approach to Improve the Topological Stability in Non-Linear Dimensionality Reduction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A pre-averaged pseudo nearest neighbor classifier.
Dapeng Li
PeerJ. Computer science | VOL. 10
Dapeng LiDapeng Li
01 Jan 2024
PeerJ. Computer science | VOL. 10

H-D and Subspace Clustering of Paradoxical High Dimensional Clinical Datasets with Dimension Reduction Techniques – a Model
S Rajeswari ... M S Josephine
Indian Journal of Science and Technology | VOL. 9
S Rajeswari, et. al.S Rajeswari ... M S Josephine
19 Oct 2016
Indian Journal of Science and Technology | VOL. 9

Nonlinear dimensionality reduction: Alternative ordination approaches for extracting and visualizing biodiversity patterns in tropical montane forest vegetation data
Miguel D Mahecha ... Erwin Beck
Ecological Informatics | VOL. 2
Miguel D Mahecha, et. al.Miguel D Mahecha ... Erwin Beck
01 Jun 2007
Ecological Informatics | VOL. 2

Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies
G Lee ... C Rodriguez
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 5
G Lee, et. al.G Lee ... C Rodriguez
01 Jul 2008
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Approach to Improve the Topological Stability in Non-Linear Dimensionality Reduction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access