Abstract

T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom ν, with ν → ∞ corresponding to SNE and ν = 1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that ν < 1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.

Highlights

  • T-distributed stochastic neighbour embedding (t-SNE) [12] and related methods [13,15] are used for data visualisation in many scientific fields dealing with thousands or even millions of high-dimensional samples

  • The idea of t-SNE was to adjust the kernel transforming pairwise low-dimensional distances into affinities: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel (t-distribution with one degree of freedom ν), ameliorating the crowding problem

  • Yang et al argued that gradient descent is not suitable for HSSNE and suggested an alternative optimisation algorithm; here we demonstrated that the standard t-SNE optimisation works reasonably well in a wide range of α values

Read more

Summary

Introduction

T-distributed stochastic neighbour embedding (t-SNE) [12] and related methods [13,15] are used for data visualisation in many scientific fields dealing with thousands or even millions of high-dimensional samples. Given that t-SNE (ν = 1) outperforms SNE (ν = ∞), it might be that for some data sets ν < 1 would offer additional insights into the structure of the data While this seems like a straightforward extension and has already been discussed in the literature [10,18], no efficient implementation of this idea has been available until now. We show that the recent FIt-SNE approximation [9] can be modified to use an arbitrary value of ν and demonstrate that ν < 1 can reveal ‘hidden’ structure, invisible with standard t-SNE

Results
Toy Examples
Mathematical Analysis
Real-Life Data Sets
Related Work
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call