A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Gennaro Gambardella,Diego Di Bernardo

doi:10.3389/fgene.2019.00734

Abstract

Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.

Highlights

Until very recently, the cost, time, and equipment needed to perform single-cell transcriptomics have limited their application to a few selected studies
We aimed at developing a computational tool that could integrate single-cell transcriptional profiles across multiple conditions by extracting relevant genes to improve data visualization and cell type identification
The intuition behind the use of the term frequency–inverse document frequency (TF-IDF) approach to scRNA-seq data is that if a gene is highly expressed in a cell, it should be scored highly than less expressed genes in the same cell, but at the same time, highly expressed genes common to many cells of different types should be scored lower than genes expressed in a specific subpopulation of cells

Summary

Introduction

The cost, time, and equipment needed to perform single-cell transcriptomics have limited their application to a few selected studies. State-of-the-art computational pipelines for scRNA-seq data visualization consist in four main steps (Trapnell et al, 2014; Klein et al, 2015; Macosko et al, 2015; Shekhar et al, 2016; Zheng et al, 2017; Butler et al, 2018): i) normalizations of raw counts scaled by a sample-specific size factors; Analyse scRNA-seq Data With Text Mining ii) feature selection by identifying most variable genes across cells; iii) dimensionality reduction with principal component analysis (PCA); and iv) projection of scRNA-seq data in an embedded space [such as t-SNE or UMAP tools (van der Maaten and Hinton, 2008; McInnes and Healy, 2018)] Most steps in these pipelines, still employ computational methods that were developed for traditional bulk RNA-seq data, not accounting for the high level of noise caused by dropouts, leading to an excess of zeros and near-zero counts in the dataset

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Aug 9, 2019
Citations: 17	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data
David Detomaso ... Nir Yosef
BMC Bioinformatics | VOL. 17
David Detomaso, et. al.David Detomaso ... Nir Yosef
23 Aug 2016
BMC Bioinformatics | VOL. 17

Visualizing High-Dimensional Single-Cell RNA-seq Data via Random Projections and Geodesic Distances
Aristidis G Vrahatis ... Sotiris K Tasoulis
-
Aristidis G Vrahatis, et. al.Aristidis G Vrahatis ... Sotiris K Tasoulis
01 Jul 2019
01 Jul 2019

Discrete distributional differential expression (D3E)--a tool for gene expression analysis of single-cell RNA-seq data.
Mihails Delmans ... Martin Hemberg
BMC Bioinformatics | VOL. 17
Mihails Delmans, et. al.Mihails Delmans ... Martin Hemberg
29 Feb 2016
BMC Bioinformatics | VOL. 17

ScTPA: a web tool for single-cell transcriptome analysis of pathway activation signatures.
Yan Zhang ... Fangjie Guo
Bioinformatics | VOL. 36
Yan Zhang, et. al.Yan Zhang ... Fangjie Guo
21 May 2020
Bioinformatics | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics