Abstract
T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom ν, with ν → ∞ corresponding to SNE and ν = 1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that ν < 1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.
Highlights
T-distributed stochastic neighbour embedding (t-SNE) [12] and related methods [13,15] are used for data visualisation in many scientific fields dealing with thousands or even millions of high-dimensional samples
The idea of t-SNE was to adjust the kernel transforming pairwise low-dimensional distances into affinities: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel (t-distribution with one degree of freedom ν), ameliorating the crowding problem
Yang et al argued that gradient descent is not suitable for HSSNE and suggested an alternative optimisation algorithm; here we demonstrated that the standard t-SNE optimisation works reasonably well in a wide range of α values
Summary
T-distributed stochastic neighbour embedding (t-SNE) [12] and related methods [13,15] are used for data visualisation in many scientific fields dealing with thousands or even millions of high-dimensional samples. Given that t-SNE (ν = 1) outperforms SNE (ν = ∞), it might be that for some data sets ν < 1 would offer additional insights into the structure of the data While this seems like a straightforward extension and has already been discussed in the literature [10,18], no efficient implementation of this idea has been available until now. We show that the recent FIt-SNE approximation [9] can be modified to use an arbitrary value of ν and demonstrate that ν < 1 can reveal ‘hidden’ structure, invisible with standard t-SNE
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have