Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time

Matti Karppa,Jukka Kohonen,Padraig Ó Catháin,Petteri Kaski

doi:10.1007/s00453-020-00727-1

Matti Karppa, Jukka Kohonen + Show 2 more

Open Access

https://doi.org/10.1007/s00453-020-00727-1

Copy DOI

Abstract

We derandomize Valiant’s (J ACM 62, Article 13, 2015) subquadratic-time algorithm for finding outlier correlations in binary data. This demonstrates that it is possible to perform a deterministic subquadratic-time similarity join of high dimensionality. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant’s randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders by Reingold et al. (Ann Math 155(1):157–187, 2002). We say that a function f:{-1,1}^drightarrow {-1,1}^D is a correlation amplifier with threshold 0le tau le 1, error gamma ge 1, and strength p an even positive integer if for all pairs of vectors x,yin {-1,1}^d it holds that (i) |langle x,yrangle |<tau d implies |langle f(x),f(y)rangle |le (tau gamma )^pD; and (ii) |langle x,yrangle |ge tau d implies left (frac{langle x,yrangle }{gamma d}right )^pD le langle f(x),f(y)rangle le left (frac{gamma langle x,yrangle }{d}right )^pD.

Highlights

We consider the task of identifying outlier-correlated pairs from large collections of weakly correlated binary vectors in {−1, 1}d
The main result of this paper is that sufficiently powerful explicit amplifiers exist to find outlier correlations in deterministic subquadratic time
As a corollary we obtain a deterministic algorithm for finding outlier correlations in subquadratic time using bucketing and fast matrix multiplication

Summary

Introduction

We consider the task of identifying outlier-correlated pairs from large collections of weakly correlated binary vectors in {−1, 1}d. Our task is to output all outlier pairs (x, y) ∈ X × Y with ⟨x, y⟩ ≥ d , subject to the assumption that at most q of the pairs (x, y) ∈ X × Y satisfy ⟨x, y⟩ > τd Remark This setting of binary vectors and (Pearson) correlation is directly motivated, among others, by the connection to Hamming distance. Our interest is in algorithms that avoid such a “curse of weak outliers” and run in subquadratic time essentially independently of the magnitude of , provided that is sufficiently separated from. Such ability to identify weak outliers from large amounts of data is useful, among others, in machine learning from noisy data. A strategy of this form is oblivious to q until we start searching inside the buckets, which enables adjusting and based on the number of large aggregate inner products

Randomized Amplification

Explicit Amplification

Our Results

Overview and Discussion of Techniques

Related Work and Applications

Preliminaries

Explicit Amplifiers by Approximate Squaring

Preliminaries on Expansion and Mixing

Main Construction

Copy‐and‐Truncate Preprocessing of the Input Dimension

Completing the Proof of Theorem 1

The Algorithm

Parameterization and Correctness

Running Time

The Light Bulb Problem

Learning Parities with Noise

Nonconstructive Existence and a Lower Bound

Low‐Dimensional Amplifiers Exist

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithmica

Lead the way for us

Journal: Algorithmica	Publication Date: Jun 20, 2020
License type: open-access

Similar Papers

RESONANCE-LIKE PHENOMENA IN ARRAYS OF RÖSSLER OSCILLATORS UNDER CORRELATED PARAMETRIC FLUCTUATIONS
M N Lorenzo ... J L Cabrera
International Journal of Bifurcation and Chaos | VOL. 11
M N Lorenzo, et. al.M N Lorenzo ... J L Cabrera
01 Oct 2001
International Journal of Bifurcation and Chaos | VOL. 11

Quantum critical response function in quasi-two-dimensional itinerant antiferromagnets
C M Varma ... Almut Schröder
Physical Review B | VOL. 92
C M Varma, et. al.C M Varma ... Almut Schröder
30 Oct 2015
Physical Review B | VOL. 92

Intensity correlation functions of dye lasers: Comparison of colored-gain-noise and colored-loss-noise models.
J M Noriega ... L Pesquera
Physical Review A | VOL. 44
J M Noriega, et. al.J M Noriega ... L Pesquera
01 Aug 1991
Physical Review A | VOL. 44

Squared Bessel Processes and Their Applications to the Square Root Interest Rate Model
Hiroshi Shirakawa
Asia-Pacific Financial Markets | VOL. 9
Hiroshi ShirakawaHiroshi Shirakawa
01 Jan 2002
Asia-Pacific Financial Markets | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithmica