Dimensionality Reduction Step Research Articles

BackgroundPseudotime estimation from dynamic single-cell transcriptomic data enables characterisation and understanding of the underlying processes, for example developmental processes. Various pseudotime estimation methods have been proposed during the last years. Typically, these methods start with a dimension reduction step because the low-dimensional representation is usually easier to analyse. Approaches such as PCA, ICA or t-SNE belong to the most widely used methods for dimension reduction in pseudotime estimation methods. However, these methods usually make assumptions on the derived dimensions, which can result in important dataset properties being missed. In this paper, we suggest a new dictionary learning based approach, dynDLT, for dimension reduction and pseudotime estimation of dynamic transcriptomic data. Dictionary learning is a matrix factorisation approach that does not restrict the dependence of the derived dimensions. To evaluate the performance, we conduct a large simulation study and analyse 8 real-world datasets.ResultsThe simulation studies reveal that firstly, dynDLT preserves the simulated patterns in low-dimension and the pseudotimes can be derived from the low-dimensional representation. Secondly, the results show that dynDLT is suitable for the detection of genes exhibiting the simulated dynamic patterns, thereby facilitating the interpretation of the compressed representation and thus the dynamic processes. For the real-world data analysis, we select datasets with samples that are taken at different time points throughout an experiment. The pseudotimes found by dynDLT have high correlations with the experimental times. We compare the results to other approaches used in pseudotime estimation, or those that are method-wise closely connected to dictionary learning: ICA, NMF, PCA, t-SNE, and UMAP. DynDLT has the best overall performance for the simulated and real-world datasets.ConclusionsWe introduce dynDLT, a method that is suitable for pseudotime estimation. Its main advantages are: (1) It presents a model-free approach, meaning that it does not restrict the dependence of the derived dimensions; (2) Genes that are relevant in the detected dynamic processes can be identified from the dictionary matrix; (3) By a restriction of the dictionary entries to positive values, the dictionary atoms are highly interpretable.

Read full abstract

High-dimensional biological data collection across heterogeneous groups of samples has become increasingly common, creating high demand for dimensionality reduction techniques that capture underlying structure of the data. Discovering low-dimensional embeddings that describe the separation of any underlying discrete latent structure in data is an important motivation for applying these techniques since these latent classes can represent important sources of unwanted variability, such as batch effects, or interesting sources of signal such as unknown cell types. The features that define this discrete latent structure are often hard to identify in high-dimensional data. Principal component analysis (PCA) is one of the most widely used methods as an unsupervised step for dimensionality reduction. This reduction technique finds linear transformations of the data which explain total variance. When the goal is detecting discrete structure, PCA is applied with the assumption that classes will be separated in directions of maximum variance. However, PCA will fail to accurately find discrete latent structure if this assumption does not hold. Visualization techniques, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), attempt to mitigate these problems with PCA by creating a low-dimensional space where similar objects are modeled by nearby points in the low-dimensional embedding and dissimilar objects are modeled by distant points with high probability. However, since t-SNE and UMAP are computationally expensive, often a PCA reduction is done before applying them which makes it sensitive to PCAs downfalls. Also, tSNE is limited to only two or three dimensions as a visualization tool, which may not be adequate for retaining discriminatory information. The linear transformations of PCA are preferable to non-linear transformations provided by methods like t-SNE and UMAP for interpretable feature weights. Here, we propose iterative discriminant analysis (iDA), a dimensionality reduction technique designed to mitigate these limitations. iDA produces an embedding that carries discriminatory information which optimally separates latent clusters using linear transformations that permit post hoc analysis to determine features that define these latent structures.

Read full abstract

Dimensionality Reduction Step Research Articles

Related Topics

Articles published on Dimensionality Reduction Step

Dictionary learning allows model-free pseudotime estimation of transcriptomic data

Content-Based Video Retrieval With Prototypes of Deep Features

A Distribution-Dependent Mumford–Shah Model for Unsupervised Hyperspectral Image Segmentation

R-PointHop: A Green, Accurate, and Unsupervised Point Cloud Registration Method.

Multi layered Stacked Ensemble Method with Feature Reduction Technique for Multi-Label Classification

Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches.

Latent Representation Prediction Networks

Learning network embeddings using small graphlets

Identity verification using palm print microscopic images based on median robust extended local binary pattern features and k-nearest neighbor classifier.

Sentiment Analysis in Twitter Based on Knowledge Graph and Deep Learning Classification

Multivariate time-series modeling with generative neural networks

Capturing discrete latent structures: choose LDs over PCs.

Quantile cross-spectral density: A novel and effective tool for clustering multivariate time series

Microbiome Preprocessing Machine Learning Pipeline.

Coupling sparse Cox models with clustering of longitudinal transcriptomics data for trauma prognosis

A novel filter feature selection method for text classification: Extensive Feature Selector

PLS regression algorithms in the presence of nonlinearity

Spectral Independent Component Analysis with noise modeling for M/EEG source separation

Fault Diagnosis in Industrial Processes by Maximizing Pairwise Kullback–Leibler Divergence

Fraud Prediction in Smart Societies Using Logistic Regression and k-fold Machine Learning Techniques

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Dimensionality Reduction Step Research Articles

Related Topics

Articles published on Dimensionality Reduction Step

Dictionary learning allows model-free pseudotime estimation of transcriptomic data

Content-Based Video Retrieval With Prototypes of Deep Features

A Distribution-Dependent Mumford–Shah Model for Unsupervised Hyperspectral Image Segmentation

R-PointHop: A Green, Accurate, and Unsupervised Point Cloud Registration Method.

Multi layered Stacked Ensemble Method with Feature Reduction Technique for Multi-Label Classification

Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches.

Latent Representation Prediction Networks

Learning network embeddings using small graphlets

Identity verification using palm print microscopic images based on median robust extended local binary pattern features and k-nearest neighbor classifier.

Sentiment Analysis in Twitter Based on Knowledge Graph and Deep Learning Classification

Multivariate time-series modeling with generative neural networks

Capturing discrete latent structures: choose LDs over PCs.

Quantile cross-spectral density: A novel and effective tool for clustering multivariate time series

Microbiome Preprocessing Machine Learning Pipeline.

Coupling sparse Cox models with clustering of longitudinal transcriptomics data for trauma prognosis

A novel filter feature selection method for text classification: Extensive Feature Selector

PLS regression algorithms in the presence of nonlinearity

Spectral Independent Component Analysis with noise modeling for M/EEG source separation

Fault Diagnosis in Industrial Processes by Maximizing Pairwise Kullback–Leibler Divergence

Fraud Prediction in Smart Societies Using Logistic Regression and k-fold Machine Learning Techniques