Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations.

Dmitry Kobak,Yuval Kluger,Stefan Steinerberger,Philipp Berens,George Linderman

doi:10.1007/978-3-030-46150-8_8

Abstract

T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom ν, with ν → ∞ corresponding to SNE and ν = 1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that ν < 1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.

Highlights

T-distributed stochastic neighbour embedding (t-SNE) [12] and related methods [13,15] are used for data visualisation in many scientific fields dealing with thousands or even millions of high-dimensional samples
The idea of t-SNE was to adjust the kernel transforming pairwise low-dimensional distances into affinities: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel (t-distribution with one degree of freedom ν), ameliorating the crowding problem
Yang et al argued that gradient descent is not suitable for HSSNE and suggested an alternative optimisation algorithm; here we demonstrated that the standard t-SNE optimisation works reasonably well in a wide range of α values

Summary

Introduction

T-distributed stochastic neighbour embedding (t-SNE) [12] and related methods [13,15] are used for data visualisation in many scientific fields dealing with thousands or even millions of high-dimensional samples. Given that t-SNE (ν = 1) outperforms SNE (ν = ∞), it might be that for some data sets ν < 1 would offer additional insights into the structure of the data While this seems like a straightforward extension and has already been discussed in the literature [10,18], no efficient implementation of this idea has been available until now. We show that the recent FIt-SNE approximation [9] can be modified to use an arbitrary value of ν and demonstrate that ν < 1 can reveal ‘hidden’ structure, invisible with standard t-SNE

Results

Toy Examples

Mathematical Analysis

Real-Life Data Sets

Related Work

Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)	Publication Date: Jan 1, 2020
Citations: 16	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)

Lead the way for us

Similar Papers

CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data
Akdes Serin Harmanci ... Xiaobo Zhou
Nature Communications | VOL. 11
Akdes Serin Harmanci, et. al.Akdes Serin Harmanci ... Xiaobo Zhou
03 Jan 2020
Nature Communications | VOL. 11

Identifying Genetic Signatures from Single-Cell RNA Sequencing Data by Matrix Imputation and Reduced Set Gene Clustering
Soumita Seth ... Arup Roy
Mathematics | VOL. 11
Soumita Seth, et. al.Soumita Seth ... Arup Roy
17 Oct 2023
Mathematics | VOL. 11

Single-cell RNA sequencing data suggest a role for angiotensin-converting enzyme 2 in kidney impairment in patients infected with 2019-novel coronavirus.
Yi-Yao Deng ... Quan Hong
Chinese Medical Journal | VOL. 133
Yi-Yao Deng, et. al.Yi-Yao Deng ... Quan Hong
01 May 2020
Chinese Medical Journal | VOL. 133

Deeper insights into transcriptional features of cancer-associated fibroblasts: An integrated meta-analysis of single-cell and bulk RNA-sequencing data.
Anastasia N Kazakova ... Olga I Aleshikova
Frontiers in Cell and Developmental Biology | VOL. 10
Anastasia N Kazakova, et. al.Anastasia N Kazakova ... Olga I Aleshikova
03 Oct 2022
Frontiers in Cell and Developmental Biology | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)