Abstract

MotivationThe rapid proliferation of single-cell RNA-sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene Selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface.ResultsWe describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse non-negative matrix factorization, cluster ‘fitness’, support vector machine) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively downsamples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets.Availability and implementationICGS2 is implemented in Python. The source code and documentation are available at http://altanalyze.org.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Recent advances in single cell RNA sequencing provide exciting new opportunities to understand cellular and molecular diversity in healthy tissues and disease

  • Using data from the Human Cell Atlas, we show that the PageRank algorithm effectively down samples ultra-large scRNASeq datasets, without losing extremely rare or transcriptionally similar distinct cell-types and while recovering novel transcriptionally unique cell populations

  • While the specific algorithms and options used for these steps varies significantly among applications, most approach rely heavily on dimensionality reduction techniques, such as PCA, t-SNE and UMAP to select features and define cell populations

Read more

Summary

Introduction

Recent advances in single cell RNA sequencing (scRNA-Seq) provide exciting new opportunities to understand cellular and molecular diversity in healthy tissues and disease. While the specific algorithms and options used for these steps varies significantly among applications, most approach rely heavily on dimensionality reduction techniques, such as PCA, t-SNE and UMAP to select features and define cell populations. While a number of methods exist to identify clusters from large lower dimensional projections, including DBSCAN, K-means, affinity propagation, Louvain clustering and spectral clustering, these, as well as other approaches require proper hyperparameter tuning. Identifying these parameters is non-intuitive and often requires multiple rounds of analysis. The increasing production of atlas sized datasets highlights the important need for highly scalable and automated computational approaches that can rapidly identify common and extremely rare populations with minimal user parameter tweaking 5

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.