SAIC: an iterative clustering approach for analysis of single cell RNA-seq data

Lu Yang,Qiang Lu,Arthur D Riggs,Xiwei Wu,Jiancheng Liu

doi:10.1186/s12864-017-4019-5

Abstract

BackgroundResearch interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important step in the single-cell transcriptome analysis is to identify distinct cell groups that have different gene expression patterns. Currently there are limited bioinformatics approaches available for single-cell RNA-seq analysis. Many studies rely on principal component analysis (PCA) with arbitrary parameters to identify the genes that will be used to cluster the single cells.ResultsWe have developed a novel algorithm, called SAIC (Single cell Analysis via Iterative Clustering), that identifies the optimal set of signature genes to separate single cells into distinct groups. Our method utilizes an iterative clustering approach to perform an exhaustive search for the best parameters within the search space, which is defined by a number of initial centers and P values. The end point is identification of a signature gene set that gives the best separation of the cell clusters. Using a simulated data set, we showed that SAIC can successfully identify the pre-defined signature gene sets that can correctly separated the cells into predefined clusters. We applied SAIC to two published single cell RNA-seq datasets. For both datasets, SAIC was able to identify a subset of signature genes that can cluster the single cells into groups that are consistent with the published results. The signature genes identified by SAIC resulted in better clusters of cells based on DB index score, and many genes also showed tissue specific expression.ConclusionsIn summary, we have developed an efficient algorithm to identify the optimal subset of genes that separate single cells into distinct clusters based on their expression patterns. We have shown that it performs better than PCA method using published single cell RNA-seq datasets.

Highlights

Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level
We developed an iterative bioinformatics approach that can identify the subset of signature genes whose expression patterns can reliably cluster the single cells into distinct groups
The results were evaluated by DaviesBouldins index and visualized using both a t-SNE 2D–plot and an unsupervised hierarchical clustering heatmap

Summary

Introduction

Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. Many current single cell data analysis approaches focused only on the clustering algorithm but were not engaged in searching for signature genes that can benefit the clustering step. These methods are conducted on genes filtered by RPKM [7, 8] values or the top genes that have the largest residuals after fitting a simple noise model [9]. At this scale, clustering results may be affected or even driven by the noise embedded in gene expression data For downstream analysis, such as biological validation and marker genes selection, it would be very difficult to study a large number of genes. It will be ideal if a smaller subset of genes can be selected and are capable of clustering the cells into distinct groups

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Oct 1, 2017
Citations: 36	License type: open-access

R Discovery Prime

R Discovery Prime

SAIC: an iterative clustering approach for analysis of single cell RNA-seq data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Recent Advances in Single-Cell Metabolomics Based on Mass Spectrometry
Qinlei Liu ... Renato Zenobi
CCS Chemistry | VOL. 5
Qinlei Liu, et. al.Qinlei Liu ... Renato Zenobi
22 Oct 2022
CCS Chemistry | VOL. 5

Use of Single-Cell -Omic Technologies to Study the Gastrointestinal Tract and Diseases, From Single Cell Identities to Patient Features
Mirazul Islam ... Ken S Lau
Gastroenterology | VOL. 159
Mirazul Islam, et. al.Mirazul Islam ... Ken S Lau
14 May 2020
Gastroenterology | VOL. 159

Three-dimensional feature matching improves coverage for single-cell proteomics based on ion mobility filtering.
Jongmin Woo ... Ronald J Moore
Cell systems | VOL. 13
Jongmin Woo, et. al.Jongmin Woo ... Ronald J Moore
16 Mar 2022
Cell systems | VOL. 13

Single cell analysis exposes intratumor heterogeneity and suggests that FLT3-ITD is a late event in leukemogenesis
Roni Shouval ... Tsila Zuckerman
Experimental Hematology | VOL. 42
Roni Shouval, et. al.Roni Shouval ... Tsila Zuckerman
02 Feb 2014
Experimental Hematology | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SAIC: an iterative clustering approach for analysis of single cell RNA-seq data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics