The art of using t-SNE for single-cell transcriptomics

Dmitry Kobak,Philipp Berens

doi:10.1038/s41467-019-13056-x

Abstract

Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). It excels at revealing local structure in high-dimensional data, but naive applications often suffer from severe shortcomings, e.g. the global structure of the data is not represented accurately. Here we describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we additionally use exaggeration and downsampling-based initialisation. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.

Highlights

Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells
Our analysis indicates that UMAP does not necessarily solve t-distributed stochastic neighbour embedding (t-SNE)’s problems out of the box and might require as many careful parameter and/ or initialisation choices as t-SNE does
We showed that using informative initialisation can substantially improve the global structure of the final embedding because it survives through the optimisation process

Summary

Introduction

Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). We describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we use exaggeration and downsampling-based initialisation. Through improved experimental techniques it has become possible to obtain gene expression data from thousands or even millions of cells[3,4,5,6,7,8] Computational analysis of such data sets often entails unsupervised, exploratory steps including dimensionality reduction for visualisation. We use FIt-SNE16, a recently developed fast t-SNE implementation, for all experiments

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Communications	Publication Date: Nov 28, 2019
Citations: 687	License type: open-access

R Discovery Prime

R Discovery Prime

The art of using t-SNE for single-cell transcriptomics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

Dynamic Interrogation of Stochastic Transcriptome Trajectories (DIST2)
Elizabeth B Torres ... Simon Schafer
-
Elizabeth B Torres, et. al.Elizabeth B Torres ... Simon Schafer
02 Feb 2020
02 Feb 2020

K-means discriminant maps for data visualization and classification
Vo Dinh Minh Nhat ... Sungyoung Lee
-
Vo Dinh Minh Nhat, et. al.Vo Dinh Minh Nhat ... Sungyoung Lee
16 Mar 2008
16 Mar 2008

DATA DIMENSIONALITY REDUCTION THROUGH CLUSTER TREES AND MANIFOLD LEARNING
Ali Amani
-
Ali AmaniAli Amani
01 Jan 2020
01 Jan 2020

Branching and Circular Features in High Dimensional Data
Bei Wang ... M Vejdemo-Johansson
IEEE Transactions on Visualization and Computer Graphics | VOL. 17
Bei Wang, et. al. Bei Wang ... M Vejdemo-Johansson
01 Dec 2011
IEEE Transactions on Visualization and Computer Graphics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The art of using t-SNE for single-cell transcriptomics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications