Abstract

The Johnson–Lindenstrauss lemma is one of the cornerstone results in dimensionality reduction. A common formulation of it, is that there exists a random linear mapping $$f : {\mathbb {R}}^n \rightarrow {\mathbb {R}}^m$$ such that for any vector $$x \in {\mathbb {R}}^n$$, f preserves its norm to within $$(1 {\pm } \varepsilon )$$ with probability $$1 - \delta $$ if $$m = \varTheta (\varepsilon ^{-2} \lg (1/\delta ))$$. Much effort has gone into developing fast embedding algorithms, with the Fast Johnson–Lindenstrauss transform of Ailon and Chazelle being one of the most well-known techniques. The current fastest algorithm that yields the optimal $$m = {\mathcal {O}}(\varepsilon ^{-2}\lg (1/\delta ))$$ dimensions has an embedding time of $${\mathcal {O}}(n \lg n + \varepsilon ^{-2} \lg ^3 (1/\delta ))$$. An exciting approach towards improving this, due to Hinrichs and Vybiral, is to use a random $$m \times n$$ Toeplitz matrix for the embedding. Using Fast Fourier Transform, the embedding of a vector can then be computed in $${\mathcal {O}}(n \lg m)$$ time. The big question is of course whether $$m = {\mathcal {O}}(\varepsilon ^{-2} \lg (1/\delta ))$$ dimensions suffice for this technique. If so, this would end a decades long quest to obtain faster and faster Johnson–Lindenstrauss transforms. The current best analysis of the embedding of Hinrichs and Vybiral shows that $$m = {\mathcal {O}}(\varepsilon ^{-2}\lg ^2 (1/\delta ))$$ dimensions suffice. The main result of this paper, is a proof that this analysis unfortunately cannot be tightened any further, i.e., there exist vectors requiring $$m = \varOmega (\varepsilon ^{-2} \lg ^2 (1/\delta ))$$ for the Toeplitz approach to work.

Highlights

  • The performance of many geometric algorithms depends heavily on the dimension of the input data

  • Running the algorithm on the lower dimensional data uses less resources and an approximate result for the high dimensional data can be derived from the low dimensional result

  • Let X ⊂ Rn be a set of N vectors, for any 0 < ε < 1/2, there exists a map f : X → Rm for some m = O(ε−2 lg N ) such that

Read more

Summary

Introduction

The performance of many geometric algorithms depends heavily on the dimension of the input data. Dimensionality reduction approximately preserving pairwise Euclidean distances has found uses in a wide variety of applications, including: Nearest-neighbour search [2, 13], clustering [6, 8], linear programming [23], streaming algorithms [20], compressed sensing. The standard technique for constructing a map with the properties of Theorem 1 is the following: Let A be an m × n matrix with entries independently sampled as either N (0, 1) random variables (as in [10]) or Rademacher (uniform among {−1, +1}) random variables (as in [1]). Using linearity of f and a union bound over all pairs x, y ∈ X, the probability that all pairwise distances (i.e. the norm of the vector x − y) are preserved can be shown to be at least 1/2

Time Complexity
Lower Bound for One Vector
Lower Bounding Γk
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call