Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data.

Soumita Seth,Tapas Bhadra,Saurav Mallik,Zhongming Zhao

doi:10.3389/fgene.2022.828479

Abstract

The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2 FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.

Highlights

Single-cell RNA sequencing technology plays a vital role in medical fields such as oncology, digestive and urinary systems, microbiology, neurology, reproduction, and immunology (Tang et al, 2019)
We provided an extensive analysis by integrating dimensionality reduction technique and clustering algorithm for detecting cluster-specific frequent biomarkers in single-cell RNA sequencing data
2.2.2 Compute Quality Control Metrics and Cell Filtration In this step, we explored quality control (QC) metrics based on user defined criteria for the selection and filtration of cells

Summary

INTRODUCTION

Single-cell RNA sequencing (scRNAseq) technology plays a vital role in medical fields such as oncology, digestive and urinary systems, microbiology, neurology, reproduction, and immunology (Tang et al, 2019). The authors claimed that their model has improved clustering performance for labeling individual singlecells, as well as the accurate estimation of number of clusters Their method faces several analytical and technical challenges in the analysis of large-scale single cell data due to high dimensionality, sparse matrix computation, and rare cell detection (Feng et al, 2020). We provide a dimensionality reduction integrated clustering model for detecting cluster-specific biomarkers in single-cell sequencing data We applied it in a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270) (Grün et al, 2015). After detecting the cell clusters, we identified cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST). Our proposed integrated framework using dimensionality reduction and hierarchical agglomerative clustering efficiently discovers clusterspecific frequent biomarkers, i.e. overlapping biomarkers from single-cell RNA sequencing data

Preprocessing of Single-Cell RNA Sequencing Data

Highly Variable Features Identification

Linear Transformation and Linear Dimensionality Reduction

Cell Cluster

Finding the Cluster-Specific Biomarkers

Hub Gene Finding

Gene Set Enrichment Analysis

AND DISCUSSION

CONCLUSION AND FUTURE WORK

Findings

DATA AVAILABILITY STATEMENT

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in genetics	Publication Date: Feb 7, 2022
Citations: 13	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics

Lead the way for us

Similar Papers

Identifying Genetic Signatures from Single-Cell RNA Sequencing Data by Matrix Imputation and Reduced Set Gene Clustering
Arup Roy ... Tapas Bhadra
Mathematics | VOL. 11
Arup Roy, et. al.Arup Roy ... Tapas Bhadra
17 Oct 2023
Mathematics | VOL. 11

Conifer: clonal tree inference for tumor heterogeneity with single-cell and bulk sequencing data
Bahram Goliaei ... Mohammad-Hadi Foroughmand-Araabi
BMC Bioinformatics | VOL. 22
Bahram Goliaei, et. al.Bahram Goliaei ... Mohammad-Hadi Foroughmand-Araabi
30 Aug 2021
BMC Bioinformatics | VOL. 22

SingleScan: a comprehensive resource for single-cell sequencing data processing and mining
Xiao Zhang ... Yixiong Gou
BMC Bioinformatics | VOL. 24
Xiao Zhang, et. al.Xiao Zhang ... Yixiong Gou
07 Dec 2023
BMC Bioinformatics | VOL. 24

Studying the History of Tumor Evolution from Single-Cell Sequencing Data by Exploring the Space of Binary Matrices.
Mohammad Haghir Ebrahimabadi ... Farid Rashidi Mehrabadi
Journal of computational biology : a journal of computational molecular cell biology | VOL. 28
Mohammad Haghir Ebrahimabadi, et. al.Mohammad Haghir Ebrahimabadi ... Farid Rashidi Mehrabadi
22 Jul 2021
Journal of computational biology : a journal of computational molecular cell biology | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics