We consider a sparse high-dimensional regression model where the goal is to recover a k-sparse unknown binary vector β∗ from n noisy linear observations of the form Y=Xβ∗+W∈Rn where X∈Rn×p has i.i.d. N(0,1) entries and W∈Rn has i.i.d. N(0,σ2) entries. In the high signal-to-noise ratio regime and sublinear sparsity regime, while the order of the sample size needed to recover the unknown vector information-theoretically is known to be n∗:=2klogp/log(k/σ2+1), no polynomial-time algorithm is known to succeed unless n>nalg:=(2k+σ2)logp. In this work, we offer a series of results investigating multiple computational and statistical aspects of the recovery task in the regime n∈[n∗,nalg]. First, we establish a novel information-theoretic property of the MLE of the problem happening around n=n∗ samples, which we coin as an “all-or-nothing behavior”: when n>n∗ it recovers almost perfectly the support of β∗, while if n<n∗ it fails to recover any fraction of it correctly. Second, at an attempt to understand the computational hardness in the regime n∈[n∗,nalg], we prove that at order nalg samples there is an Overlap Gap Property (OGP) phase transition occurring at the landscape of the MLE: for constants c,C>0 when n<cnalg OGP appears in the landscape of MLE while if n>Cnalg OGP disappears. OGP is a geometric “disconnectivity” property, which initially appeared in the theory of spin glasses and is known to suggest algorithmic hardness when it occurs. Finally, using certain technical results obtained to establish the OGP phase transition, we additionally establish various novel positive and negative algorithmic results for the recovery task of interest, including the failure of LASSO with access to n<cnalg samples and the success of a simple local search method with access to n>Cnalg samples.
Read full abstract