K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis

Sean Cottrell,Yuta Hozumi,Guo-Wei Wei

doi:10.1016/j.compbiomed.2024.108497

Abstract

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell–cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine

Lead the way for us

Journal: Computers in Biology and Medicine	Publication Date: Apr 24, 2024
Citations: 4

Similar Papers

Preprocessing of Single Cell RNA Sequencing Data Using Correlated Clustering and Projection.
Yuta Hozumi ... Guo-Wei Wei
Journal of chemical information and modeling | VOL. 64
Yuta Hozumi, et. al.Yuta Hozumi ... Guo-Wei Wei
04 Jul 2023
Journal of chemical information and modeling | VOL. 64

Revisiting Dimensionality Reduction Techniques for Visual Cluster Analysis: An Empirical Study.
Jiazhi Xia ... Yuchen Zhang
IEEE Transactions on Visualization and Computer Graphics | VOL. 28
Jiazhi Xia, et. al.Jiazhi Xia ... Yuchen Zhang
01 Jan 2021
IEEE Transactions on Visualization and Computer Graphics | VOL. 28

A Comparison for Dimensionality Reduction Methods of Single-Cell RNA-seq Data.
Ruizhi Xiang ... Chaohan Xu
Frontiers in Genetics | VOL. 12
Ruizhi Xiang, et. al.Ruizhi Xiang ... Chaohan Xu
23 Mar 2021
Frontiers in Genetics | VOL. 12

Decision letter: Identification of phenotypically, functionally, and anatomically distinct stromal niche populations in human bone marrow based on single-cell RNA sequencing
Dirk Strunk ... Mone Zaidi
-
Dirk Strunk, et. al.Dirk Strunk ... Mone Zaidi
06 Sep 2022
06 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis

Abstract

Talk to us

Similar Papers

More From: Computers in Biology and Medicine