Abstract

The single-cell RNA sequencing provides a way to obtain marker genes of different cells, which lays the foundation for discovering new cell types. The general strategy of achieving this goal is to build a clustering pipeline and derive differentially expressed genes, followed by the cell type enrichment analysis and driving force analysis. Throughout the entire analysis process, clustering models and appropriate methods of dimension reduction are two vital and challenging tasks. In this study, we present a novel method LAK (a computational pipeline for single-cell RNA-seq data clustering analysis using Lasso and K-means based feature selection method) that can be applied to single-cell RNA-seq data by selecting the candidate genes. To deal with the sparse high-dimensional data, we integrated Lasso penalty into clustering method for single-cell RNA-seq data as the feature selection method, which extracts out the genes that have an actual effect on clustering. We also improved the parameter selection algorithm to search the appropriate parameters automatically by binary search according to the size of the data. Compared with other computational approaches, LAK obtains a better performance in reliability, stability, convenience and accuracy applied to the real datasets, the simulation data, and the datasets with a large number of dropout events.

Highlights

  • The single-cell RNA sequencing technology is a powerful tool that demonstrates unprecedented precision in exploring biological processes and disease mechanisms

  • Comparing our differentially expressed genes with the marker genes provided by the author (Thy1, Gad1, Tbr1, Spink8, Mbp, Aldoc, Aif1, Cldn5, Acta2), we find that most of our clusters can be matched up with the unique marker provided by the author

  • We implemented our pipeline to the Zeisel dataset to validate our method by comparing the differentially expressed genes in our clusters and marker genes provided by the author

Read more

Summary

Introduction

The single-cell RNA sequencing (scRNA-seq) technology is a powerful tool that demonstrates unprecedented precision in exploring biological processes and disease mechanisms. By the single-cell RNA-seq analysis, somatic mutations at the individual cell levels and cell types in a sample are understood with high precision [5], [6]. The major advantage of scRNA-seq is that it enables unsupervised learning of population structure, and discovers the novel subtypes and rare cell species by dissecting complex and heterogeneous cell populations effectively [7]. It facilitates a deeper understanding of cell heterogeneity [7]. In the single-cell RNA-seq data analysis, one of the relatively significant studies is unsupervised single-cell clustering analysis, which aims to cluster unknown cells of the sample into clusters using the cluster algorithm

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call