Low Intrinsic Dimensionality Research Articles

With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional data has emerged as a task of paramount importance. Relevant issues to address in this context include the sheer volume of data that may consist of categorical samples, the typically streaming format of acquisition, and the possibly missing entries. To cope with these challenges, the present paper develops a novel categorical subspace learning approach to unravel the latent structure for three prominent categorical (bilinear) models, namely, Probit, Tobit, and Logit. The deterministic Probit and Tobit models treat data as quantized values of an analog-valued process lying in a low-dimensional subspace, while the probabilistic Logit model relies on low dimensionality of the data log-likelihood ratios. Leveraging the low intrinsic dimensionality of the sought models, a rank regularized maximum-likelihood estimator is devised, which is then solved recursively via alternating majorization-minimization to sketch high-dimensional categorical data `on the fly.' The resultant procedure alternates between sketching the new incomplete datum and refining the latent subspace, leading to lightweight first-order algorithms with highly parallelizable tasks per iteration. As an extra degree of freedom, the quantization thresholds are also learned jointly along with the subspace to enhance the predictive power of the sought models. Performance of the subspace iterates is analyzed for both infinite and finite data streams, where for the former asymptotic convergence to the stationary point set of the batch estimator is established, while for the latter sublinear regret bounds are derived for the empirical cost. Simulated tests with both synthetic and real-world datasets corroborate the merits of the novel schemes for real-time movie recommendation and chess-game classification.

In the backbone of large-scale networks, origin-to-destination (OD) traffic flows experience abrupt unusual changes known as traffic volume anomalies, which can result in congestion and limit the extent to which end-user quality of service requirements are met. As a means of maintaining seamless end-user experience in dynamic environments, as well as for ensuring network security, this paper deals with a crucial network monitoring task termed dynamic anomalography. Given link traffic measurements (noisy superpositions of unobserved OD flows) periodically acquired by backbone routers, the goal is to construct an estimated map of anomalies in real time, and thus summarize the network `health state' along both the flow and time dimensions. Leveraging the low intrinsic-dimensionality of OD flows and the sparse nature of anomalies, a novel online estimator is proposed based on an exponentially-weighted least-squares criterion regularized with the sparsity-promoting <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> -norm of the anomalies, and the nuclear norm of the nominal traffic matrix. After recasting the non-separable nuclear norm into a form amenable to online optimization, a real-time algorithm for dynamic anomalography is developed and its convergence established under simplifying technical assumptions. For operational conditions where computational complexity reductions are at a premium, a lightweight stochastic gradient algorithm based on Nesterov's acceleration technique is developed as well. Comprehensive numerical tests with both synthetic and real network data corroborate the effectiveness of the proposed online algorithms and their tracking capabilities, and demonstrate that they outperform state-of-the-art approaches developed to diagnose traffic anomalies.

Low Intrinsic Dimensionality Research Articles

Related Topics

Articles published on Low Intrinsic Dimensionality

Online Categorical Subspace Learning for Sketching Big Data with Misses

Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data.

Hyperspectral Computational Ghost Imaging via Temporal Multiplexing

Low Dimensional Embeddings of Doubling Metrics

Detecting False Data Injection Attacks on Power Grid by Sparse Optimization

Load Curve Data Cleansing and Imputation Via Sparsity and Low Rank

Dynamic Anomalography: Tracking Network Anomalies Via Sparsity and Low Rank

Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm

Thermal Response Variability of Random Polycrystalline Microstructures

Incremental alignment manifold learning

Exploring space–time structure of human mobility in urban space

Adapting indexing trees to data distribution in feature spaces

Classifying functional time series

K-nearest neighbors directed noise injection in multilayer perceptron training

Infrared properties of exotic superconductors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Low Intrinsic Dimensionality Research Articles

Related Topics

Articles published on Low Intrinsic Dimensionality

Online Categorical Subspace Learning for Sketching Big Data with Misses

Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data.

Hyperspectral Computational Ghost Imaging via Temporal Multiplexing

Low Dimensional Embeddings of Doubling Metrics

Detecting False Data Injection Attacks on Power Grid by Sparse Optimization

Load Curve Data Cleansing and Imputation Via Sparsity and Low Rank

Dynamic Anomalography: Tracking Network Anomalies Via Sparsity and Low Rank

Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm

Thermal Response Variability of Random Polycrystalline Microstructures

Incremental alignment manifold learning

Exploring space–time structure of human mobility in urban space

Adapting indexing trees to data distribution in feature spaces

Classifying functional time series

K-nearest neighbors directed noise injection in multilayer perceptron training

Infrared properties of exotic superconductors