Data-driven assessment of dimension reduction quality for single-cell omics data

Xiaoru Dong,Rhonda Bacher

doi:10.1016/j.patter.2022.100465

Abstract

Dimension reduction (DR) techniques have become synonymous with single-cell omics data due to their ability to generate attractive visualizations and enable analyses of high-dimensional data. In this issue of Patterns, Johnsona et al. develop a statistical approach to assist in selecting high-quality reduced representations to improve analyses and biological interpretations.

Highlights

Dimension reduction (DR) involves projecting highdimensional data into a lower dimensional space in order to reduce noise signals in the data while retaining key features
The traditional and most familiar form of DR is done via principal component analysis (PCA), which performs linear transformations and preserves the Euclidean distance between features
Choosing an appropriate DR method, one that is able to retain the structure of original data and impose the least distortion of biological signals, is a priority

Summary

Introduction

DR involves projecting highdimensional data into a lower dimensional space in order to reduce noise signals in the data while retaining key features. More recent nonlinear approaches, such as t-distributed stochastic neighbor embedding (t-SNE)[3] and uniform approximation and projection method (UMAP),[4] have become popular in single-cell data and are highly regarded for their ability to produce appealing visualizations of cell clusters. The concern around DR approaches used on single-cell data has largely resulted in developing novel DR methods or heuristic guidelines based on benchmarking studies.[6,7] choosing an optimal DR method for a given dataset and analysis remains an open question.

Results

Conclusion