A systematic performance evaluation of clustering methods for single-cell RNA-seq data.

Angelo Duò,Charlotte Soneson,Mark D Robinson

doi:10.12688/f1000research.15666.3

Angelo Duò, Charlotte Soneson + Show 1 more

Open Access

https://doi.org/10.12688/f1000research.15666.3

Copy DOI

Abstract

Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub ( https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor ( https://bioconductor.org/packages/DuoClustering2018).

Highlights

10 Sep 2018 report report nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability
We evaluate 14 clustering algorithms, including both methods developed for scRNA-seq data, methods developed for other types of single-cell data, and more general approaches, on a total of 12 different data sets
Large differences in performance across data sets and methods The 14 methods were tested on real data sets as well as simulations with a varying degree of complexity (Table 1) and across a range of the number of subpopulations

Summary

Introduction

10 Sep 2018 report report nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us version 1. 26 Jul 2018 report report to focus on the investigation of the performance of the clustering algorithms themselves. 1. Jean Fan , Harvard Medical School, We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. We investigated whether the performance could be improved by Boston, USA Harvard University, Cambridge, USA generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing. We found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing. Any reports and responses or comments on the article can be found at the end of the article

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Nov 16, 2020
Citations: 222	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A systematic performance evaluation of clustering methods for single-cell RNA-seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

A systematic performance evaluation of clustering methods for single-cell RNA-seq data
Angelo Duò ... Angelo Duò
F1000Research | VOL. 7
Angelo Duò, et. al.Angelo Duò ... Angelo Duò
31 Aug 2018
F1000Research | VOL. 7

A systematic performance evaluation of clustering methods for single-cell RNA-seq data.
Angelo Duò ... Mark D Robinson
F1000Research | VOL. 7
Angelo Duò, et. al.Angelo Duò ... Mark D Robinson
10 Sep 2018
F1000Research | VOL. 7

A systematic performance evaluation of clustering methods for single-cell RNA-seq data
Angelo Duò ... Mark D Robinson
F1000Research | VOL. 7
Angelo Duò, et. al.Angelo Duò ... Mark D Robinson
26 Jul 2018
F1000Research | VOL. 7

Improvements Achieved by Multiple Imputation for Single-Cell RNA-Seq Data in Clustering Analysis and Differential Expression Analysis.
Mengqiu Zhu ... Yinglei Lai
Journal of Computational Biology | VOL. 29
Mengqiu Zhu, et. al.Mengqiu Zhu ... Yinglei Lai
16 May 2022
Journal of Computational Biology | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A systematic performance evaluation of clustering methods for single-cell RNA-seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research