Abstract
We propose a novel framework for the deterministic construction of linear, near-isometric embeddings of a finite set of data points. Given a set of training points X ⊂ \BBR <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</sup> , we consider the secant set S(X) that consists of all pairwise difference vectors of X, normalized to lie on the unit sphere. We formulate an affine rank minimization problem to construct a matrix Ψ that preserves the norms of all the vectors in S(X) up to a distortion parameter δ. While affine rank minimization is NP-hard, we show that this problem can be relaxed to a convex formulation that can be solved using a tractable semidefinite program (SDP). In order to enable scalability of our proposed SDP to very large-scale problems, we adopt a two-stage approach. First, in order to reduce compute time, we develop a novel algorithm based on the Alternating Direction Method of Multipliers (ADMM) that we call Nuclear norm minimization with Max-norm constraints (NuMax) to solve the SDP. Second, we develop a greedy, approximate version of NuMax based on the column generation method commonly used to solve large-scale linear programs. We demonstrate that our framework is useful for a number of signal processing applications via a range of experiments on large-scale synthetic and real datasets.
Highlights
In many applications, we seek a low-dimensional representation of data that are elements of a high-dimensional ambient space
If the training set X comprises sufficiently many points that are uniformly drawn from a low-dimensional smooth manifold M, we show that the matrix Ψ satisfies the restricted isometry property (RIP) for signals belonging to M and enables the design of efficient measurement matrices for the compressive sensing of manifold-modeled datasets
The Sparse Manifold Learning and Clustering (SMCE) approach, proposed in [24] aims to address this issue by constructing an embedding by directly operating on the normalized secant set S(X ); SMCE relies on a spectral decomposition that does not seem to lead to isometry guarantees
Summary
We seek a low-dimensional representation (or embedding) of data that are elements of a high-dimensional ambient space. With high probability, Φ is near-isometric under a certain lower-bound on M [3, 4] This approach can be extended to signal classes beyond finite point clouds including points that lie on compact, differentiable low-dimensional manifolds [5, 6] as well as pairwise distances between all sparse signals [7]. In order to achieve scalability to large-scale problems, we propose a modified, greedy version of NuMax that mirrors the column generation approach commonly used to solve large-scale linear programs [12] With this modification, NuMax can efficiently solve problems where the number of elements in the secant set S(X ), i.e., the number of constraints in (4), is extremely large (e.g., 107 or greater). By carefully pruning the secant set S(X ), we can tailor Ψ for more general signal inference tasks, such as supervised binary classification
Submitted Version (
Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have