Abstract

Recent single-cell transcriptomic studies revealed new insights into cell-type heterogeneities in cellular microenvironments unavailable from bulk studies. A significant drawback of currently available algorithms is the need to use empirical parameters or rely on indirect quality measures to estimate the degree of complexity, i.e., the number of subgroups present in the sample. We fill this gap with a single-cell data analysis procedure allowing for unambiguous assessments of the depth of heterogeneity in subclonal compositions supported by data. Our approach combines nonnegative matrix factorization, which takes advantage of the sparse and nonnegative nature of single-cell RNA count data, with Bayesian model comparison enabling de novo prediction of the depth of heterogeneity. We show that the method predicts the correct number of subgroups using simulated data, primary blood mononuclear cell, and pancreatic cell data. We applied our approach to a collection of single-cell tumor samples and found two qualitatively distinct classes of cell-type heterogeneity in cancer microenvironments.

Highlights

  • Gene expression heterogeneities on the level of individual cells reflect key biological features not apparent from bulk properties, promising novel insights into molecular mechanisms underlying, e.g., development of neurons (Poulin et al, 2016), stem cell biology (Wen & Tang, 2016), and cancer (Navin, 2015; Winterhoff et al, 2017; Cieslik & Chinnaiyan, 2018; Nguyen et al, 2018)

  • Classical unsupervised clustering and more recent dimensional reduction methods have been successfully adapted to single-cell RNA-seq data (Grün et al, 2015; Macosko et al, 2015; Bacher & Kendziorski, 2016; Li et al, 2017), a common drawback is the need to specify the degree of complexity in clustering, either by fixing the total number of subgroups anticipated or by choosing a resolution parameter controlling the extent of dimensional reduction

  • We examined maximum likelihood (ML)-nonnegative matrix factorization (NMF) quality measures of two representative tumor samples, each from type I and II classes

Read more

Summary

Introduction

Gene expression heterogeneities on the level of individual cells reflect key biological features not apparent from bulk properties, promising novel insights into molecular mechanisms underlying, e.g., development of neurons (Poulin et al, 2016), stem cell biology (Wen & Tang, 2016), and cancer (Navin, 2015; Winterhoff et al, 2017; Cieslik & Chinnaiyan, 2018; Nguyen et al, 2018). Classical unsupervised clustering and more recent dimensional reduction methods have been successfully adapted to single-cell RNA-seq data (Grün et al, 2015; Macosko et al, 2015; Bacher & Kendziorski, 2016; Li et al, 2017), a common drawback is the need to specify the degree of complexity in clustering, either by fixing the total number of subgroups anticipated or by choosing a resolution parameter controlling the extent of dimensional reduction. Because the degree of cell-type diversity expected from data is often unknown in real applications, a clustering approach capable of inferring the number of cell types present in a sample solely based on statistical evidence would provide a significant advantage, freeing cell-type classification and discovery process from potential resolution bias

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call