Sparse Regression via Range Counting.

Jean Cardinal ,Aurélien Ooms

doi:10.4230/lipics.swat.2020.20

Abstract

The sparse regression problem, also known as best subset selection problem, can be cast as follows: Given a set S of n points in ℝ^d, a point y∈ ℝ^d, and an integer 2 ≤ k ≤ d, find an affine combination of at most k points of S that is nearest to y. We describe a O(n^{k-1} log^{d-k+2} n)-time randomized (1+e)-approximation algorithm for this problem with d and e constant. This is the first algorithm for this problem running in time o(n^k). Its running time is similar to the query time of a data structure recently proposed by Har-Peled, Indyk, and Mahabadi (ICALP'18), while not requiring any preprocessing. Up to polylogarithmic factors, it matches a conditional lower bound relying on a conjecture about affine degeneracy testing. In the special case where k = d = O(1), we provide a simple O_δ(n^{d-1+δ})-time deterministic exact algorithm, for any δ > 0. Finally, we show how to adapt the approximation algorithm for the sparse linear regression and sparse convex regression problems with the same running time, up to polylogarithmic factors.

Full Text