Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods.

Monika Krzak,Yordan Raykov,Alexis Boukouvalas,Luisa Cutillo,Claudia Angelini

doi:10.3389/fgene.2019.01253

Abstract

Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.

Highlights

Single-cell RNA sequencing has emerged as an important technology that allows profiling gene expression at single-cell resolution, giving new insights into cellular development (Biase et al, 2014; Goolam et al, 2016), dynamics (Vuong et al, 2018; Farbehi et al, 2019), and cell composition (Darmanis et al, 2015; Zeisel et al, 2015; Segerstolpe et al, 2016)
We evaluated the performance of the methods depending on a various number of dimensions supplied to dimension reduction techniques prior to clustering
We evaluated the performance of the methods in terms of i) Adjusted Rand Index (ARI) index, ii) accuracy of methods in estimating the correct number of clusters, iii) running time

Summary

Introduction

Single-cell RNA sequencing (scRNAseq) has emerged as an important technology that allows profiling gene expression at single-cell resolution, giving new insights into cellular development (Biase et al, 2014; Goolam et al, 2016), dynamics (Vuong et al, 2018; Farbehi et al, 2019), and cell composition (Darmanis et al, 2015; Zeisel et al, 2015; Segerstolpe et al, 2016). A growing class of computational methods is being developed for identifying distinct cell populations (Andrews and Hemberg, 2018) These methods are based on various types of clustering techniques, which aim to divide cells into groups that share similar gene expression patterns. Before applying the clustering algorithm, such methods often require to perform a series of mandatory or optional steps that include preprocessing, filtering or dimension reduction (Luecken and Theis, 2019). In several cases, such steps can be adapted by the user by choosing an appropriate set of parameters. Many methods often utilize dimension reduction techniques, such as Principal Component Analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (tSNE), in order to reduce the high-dimensional space (expression of tens of thousands of genes) prior to clustering (Julia et al, 2015; Herman and Grün, 2018; Ren et al, 2019)

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Dec 11, 2019
Citations: 65	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Fuzzy-Rough Set Bireducts for Data Reduction
Neil Mac Parthalain ... Ren Diao
IEEE Transactions on Fuzzy Systems | VOL. 28
Neil Mac Parthalain, et. al.Neil Mac Parthalain ... Ren Diao
30 Jul 2019
IEEE Transactions on Fuzzy Systems | VOL. 28

Comparison of Data Visualization, Outlier Detection and Data Dimensionality Reduction Methods
Xingyu Zhao
Highlights in Science, Engineering and Technology | VOL. 85
Xingyu ZhaoXingyu Zhao
13 Mar 2024
Highlights in Science, Engineering and Technology | VOL. 85

Use of Single-Cell -Omic Technologies to Study the Gastrointestinal Tract and Diseases, From Single Cell Identities to Patient Features
Mirazul Islam ... Ken S Lau
Gastroenterology | VOL. 159
Mirazul Islam, et. al.Mirazul Islam ... Ken S Lau
14 May 2020
Gastroenterology | VOL. 159

Deep kernel
Linh Le ... Jennifer Priestley
-
Linh Le, et. al.Linh Le ... Jennifer Priestley
06 Dec 2016
06 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics