Abstract

The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2 FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.

Highlights

  • Single-cell RNA sequencing technology plays a vital role in medical fields such as oncology, digestive and urinary systems, microbiology, neurology, reproduction, and immunology (Tang et al, 2019)

  • We provided an extensive analysis by integrating dimensionality reduction technique and clustering algorithm for detecting cluster-specific frequent biomarkers in single-cell RNA sequencing data

  • 2.2.2 Compute Quality Control Metrics and Cell Filtration In this step, we explored quality control (QC) metrics based on user defined criteria for the selection and filtration of cells

Read more

Summary

INTRODUCTION

Single-cell RNA sequencing (scRNAseq) technology plays a vital role in medical fields such as oncology, digestive and urinary systems, microbiology, neurology, reproduction, and immunology (Tang et al, 2019). The authors claimed that their model has improved clustering performance for labeling individual singlecells, as well as the accurate estimation of number of clusters Their method faces several analytical and technical challenges in the analysis of large-scale single cell data due to high dimensionality, sparse matrix computation, and rare cell detection (Feng et al, 2020). We provide a dimensionality reduction integrated clustering model for detecting cluster-specific biomarkers in single-cell sequencing data We applied it in a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270) (Grün et al, 2015). After detecting the cell clusters, we identified cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST). Our proposed integrated framework using dimensionality reduction and hierarchical agglomerative clustering efficiently discovers clusterspecific frequent biomarkers, i.e. overlapping biomarkers from single-cell RNA sequencing data

Preprocessing of Single-Cell RNA Sequencing Data
Highly Variable Features Identification
Linear Transformation and Linear Dimensionality Reduction
Cell Cluster
Finding the Cluster-Specific Biomarkers
Hub Gene Finding
Gene Set Enrichment Analysis
AND DISCUSSION
CONCLUSION AND FUTURE WORK
Findings
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call