Abstract

MotivationSingle-cell RNA-seq allows researchers to identify cell populations based on unsupervised clustering of the transcriptome. However, subpopulations can have only subtle transcriptomic differences and the high dimensionality of the data makes their identification challenging.ResultsWe introduce ILoReg, an R package implementing a new cell population identification method that improves identification of cell populations with subtle differences through a probabilistic feature extraction step that is applied before clustering and visualization. The feature extraction is performed using a novel machine learning algorithm, called iterative clustering projection (ICP), that uses logistic regression and clustering similarity comparison to iteratively cluster data. Remarkably, ICP also manages to integrate feature selection with the clustering through L1-regularization, enabling the identification of genes that are differentially expressed between cell populations. By combining solutions of multiple ICP runs into a single consensus solution, ILoReg creates a representation that enables investigating cell populations with a high resolution. In particular, we show that the visualization of ILoReg allows segregation of immune and pancreatic cell populations in a more pronounced manner compared with current state-of-the-art methods.Availability and implementationILoReg is available as an R package at https://bioconductor.org/packages/ILoReg.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Single-cell RNA-seq enables identification of known and novel cell populations by unsupervised clustering of transcriptomic profiles of individual cells

  • We have developed a cell population identification method (ILoReg) that takes an alternative approach to dimensionality reduction by means of feature extraction

  • iterative clustering projection (ICP) is a clustering algorithm (Fig. 1a, Supplementary Fig. 1 and Methods) that iteratively seeks a clustering of size k that maximizes the clustering similarity between the clustering and its projection by logistic regression, measured by the adjusted Rand Index (ARI)

Read more

Summary

Introduction

Single-cell RNA-seq (scRNA-seq) enables identification of known and novel cell populations by unsupervised clustering of transcriptomic profiles of individual cells. We benchmarked ILoReg against four other clustering methods[6,7,8,9], Seurat, SC3, CIDR and RaceID3, each functioning on a largely different principle (Supplementary Table 1), using eleven gold (Pollen) or silver (Baron and van Galen data) standard datasets from three publicly available studies[10,11,12] (Methods and Supplementary Table 2).

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call