Fast t-SNE algorithm with forest of balanced LSH trees and hybrid computation of repulsive forces

Marek Orliński,Norbert Jankowski

doi:10.1016/j.knosys.2020.106318

Abstract

An acceleration of the well-known t-Stochastic Neighbor Embedding (t-SNE) (Hinton and Roweis, 2003; Maaten and Hinton, 2008) algorithm, probably the best (nonlinear) dimensionality reduction and visualization method, is proposed in this article.By using a specially-tuned forest of balanced trees constructed via locality sensitive hashing is improved significantly upon the results presented in Maaten (2014), achieving a complexity significantly closer to true O(nlogn), and vastly improving behavior for huge numbers of instances and attributes. Such acceleration removes the necessity to use PCA to reduce dimensionality before the start of t-SNE.Additionally, a fast hybrid method for repulsive forces computation (a part of the t-SNE algorithm), which is currently the fastest method known, is proposed.A parallelized version of our algorithm, characterized by a very good speedup factor, is proposed.

Full Text