Global and local structure preserving GPU t-SNE methods for large-scale applications

Bruno Henrique Meyer,Aurora Trinidad Ramirez Pozo,Wagner M Nunan Zola

doi:10.1016/j.eswa.2022.116918

Bruno Henrique Meyer, Aurora Trinidad Ramirez Pozo + Show 1 more

https://doi.org/10.1016/j.eswa.2022.116918

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Currently, the use of dimensionality reduction techniques such as t-distributed stochastic neighbor embedding (t-SNE) to visualize data has become essential in dealing with large-scale datasets. The state-of-the-art t-SNE-based techniques rely on a variety of methods to take advantage of GPU parallelism.The major contributions of this work consist of a new approach named simulated wide-warp anchor t-SNE (SWW-AtSNE) that combines the SWW-tSNE technique with the anchor t-SNE (AtSNE) approach, which has better preservation of global structures than SWW-tSNE and a faster execution time than AtSNE. The preservation of global structures was measured with a new metric called medium neighborhood preservation (MNP). We also propose and study the adaptations of the technique simulated wide-warp t-SNE (SWW-tSNE). The adaptations consist of using a preprocessing technique or changing the initialization method using principal component analysis (PCA). The proposal of SWW-AtSNE and the adaptations of SWW-tSNE also include the possibility of performing dimensionality reduction in two dimensions in addition to three dimensions.Furthermore, this research compares different t-SNE-based techniques using large-scale datasets. Two essential criteria are used in the comparisons: the preservation of global and local structures. Moreover, this paper compares seven methods through two AI applications: reinforcement learning and generative adversarial networks (GANs).The experimental results show that strategies such as the AtSNE method could improve dimensionality reduction quality, considering the preservation of global structures. However, it cannot achieve better results than other approaches, such as using principal component analysis in the initialization of t-SNE. Nevertheless, the ideas of both methods could be merged into a unique technique in future studies.

Full Text