Abstract

Single-cell analysis is a powerful tool for dissecting the cellular composition within a tissue or organ. However, it remains difficult to detect rare and common cell types at the same time. Here, we present a new computational method, GiniClust2, to overcome this challenge. GiniClust2 combines the strengths of two complementary approaches, using the Gini index and Fano factor, respectively, through a cluster-aware, weighted ensemble clustering technique. GiniClust2 successfully identifies both common and rare cell types in diverse datasets, outperforming existing methods. GiniClust2 is scalable to large datasets.

Highlights

  • Genome-wide transcriptomic profiling has served as a paradigm for the systematic characterization of molecular signatures associated with biological functions and disease-related alterations, but traditionally this could only be done using bulk samples that often contain significant cellular heterogeneity

  • Given each method’s distinct feature space, we find that GiniClust and Fano factor-based k-means tend to emphasize the accurate clustering of rare and common cell types, respectively, at the expense of their complements

  • Since GiniClust is more accurate for detecting rare clusters, its outcome is more highly weighted for rare cluster assignments, while Fano factor-based k-means is more accurate for detecting common clusters and its outcome is more highly weighted for common cluster assignments

Read more

Summary

Introduction

Genome-wide transcriptomic profiling has served as a paradigm for the systematic characterization of molecular signatures associated with biological functions and disease-related alterations, but traditionally this could only be done using bulk samples that often contain significant cellular heterogeneity. The recent development of singlecell technologies has enabled biologists to dissect cellular heterogeneity within a cell population. Such efforts have led to an increased understanding of cell-type composition, lineage relationships, and mechanisms underlying cell-fate transitions. As the throughput of single-cell technology increases dramatically, it has become feasible to characterize major cell types, and to detect cells that are present at low frequencies, including those that are known to play an important role in development and disease, such as stem and progenitor cells, cancerinitiating cells, and drug-resistant cells [1, 2]. Despite the intensive effort in method development [3,4,5,6,7,8], significant limitations remain.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call