Single cell RNA-seq data clustering using TF-IDF based methods

Marmar Moussa,Ion I Măndoiu

doi:10.1186/s12864-018-4922-4

Marmar Moussa, Ion I Măndoiu

Open Access

https://doi.org/10.1186/s12864-018-4922-4

Copy DOI

Journal: BMC Genomics	Publication Date: Aug 1, 2018
Citations: 25	License type: open-access

Affiliation: University of Connecticut

Abstract

BackgroundSingle cell transcriptomics is critical for understanding cellular heterogeneity and identification of novel cell types. Leveraging the recent advances in single cell RNA sequencing (scRNA-Seq) technology requires novel unsupervised clustering algorithms that are robust to high levels of technical and biological noise and scale to datasets of millions of cells.ResultsWe present novel computational approaches for clustering scRNA-seq data based on the Term Frequency - Inverse Document Frequency (TF-IDF) transformation that has been successfully used in the field of text analysis.ConclusionsEmpirical experimental results show that TF-IDF methods consistently outperform commonly used scRNA-Seq clustering approaches.

Highlights

Single cell transcriptomics is critical for understanding cellular heterogeneity and identification of novel cell types
Each of the 36 clustering algorithms described in the Methods section was run on 2-class synthetic mixtures of 1,000 cells sampled in different ratios from six pairs of immune cell types as described in Experimental setup
Each plot shows the median of the corresponding measure as the middle horizontal line, along with mean values as the middle points connected by lines across methods

Summary

Introduction

Single cell transcriptomics is critical for understanding cellular heterogeneity and identification of novel cell types. Leveraging the recent advances in single cell RNA sequencing (scRNA-Seq) technology requires novel unsupervised clustering algorithms that are robust to high levels of technical and biological noise and scale to datasets of millions of cells. The recent advances in single cell RNA sequencing (scRNA-Seq) technologies promise to unveil novel cell types and uncover subtle regulatory processes that are undetectable by analyzing bulk samples. Droplet-based technologies such as the Chromium Megacell commercialized by 10x Genomics can quickly and inexpensively generate scRNA-Seq expression profiles for up to millions of cells. The large amounts of data and high levels of noise render many unsupervised clustering methods developed for bulk gene expression data [1] unusable, prompting the development of a new generation of clustering tools.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Single cell RNA-seq data clustering using TF-IDF based methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Decision letter: Single-cell RNA sequencing of the Strongylocentrotus purpuratus larva reveals the blueprint of major cell types and nervous system of a non-chordate deuterostome
Veronica Hinman ... Marianne E Bronner
-
Veronica Hinman, et. al.Veronica Hinman ... Marianne E Bronner
06 Jul 2021
06 Jul 2021

Molecular taxonomy of nociceptors and pruriceptors.
Jussi Kupari ... Patrik Ernfors
Pain | VOL. 164
Jussi Kupari, et. al.Jussi Kupari ... Patrik Ernfors
25 Jan 2023
Pain | VOL. 164

Single-cell RNA sequencing in cardiovascular development, disease and medicine.
David T Paik ... Howard Y Chang
Nature Reviews Cardiology | VOL. 17
David T Paik, et. al.David T Paik ... Howard Y Chang
30 Mar 2020
Nature Reviews Cardiology | VOL. 17

Single-cell RNA sequencing in diabetic kidney disease: a literature review
Wei Tan ... Jurong Yang
Renal Failure | VOL. 46
Wei Tan, et. al.Wei Tan ... Jurong Yang
04 Aug 2024
Renal Failure | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Single cell RNA-seq data clustering using TF-IDF based methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics