In many application areas ranging from bioinformatics to imaging, we are faced with the following question: can we recover a sparse vector $x_{o} \in \mathbb {R}^{N}$ from its undersampled set of noisy observations $y \in \mathbb {R}^{n}$ , $y= A x_{o}+ w$ . The last decade has witnessed a surge of algorithms and theoretical results to address this question. One of the most popular schemes is the $\ell _{p}$ -regularized least squares given by the following formulation: $\hat x(\gamma ,p ) \in {\arg \min }_{x}~({1}/{2})\| {y - Ax} \|_{2}^{2} + \gamma {\| x \|_{p}^{p}}$ , where $p \in [{0, 1}]$ . Among these optimization problems, the case $p = 1$ , also known as LASSO, is the best accepted in practice, for the following two reasons. First, thanks to the extensive studies performed in the fields of high-dimensional statistics and compressed sensing, we have a clear picture of LASSO’s performance. Second, it is convex and efficient algorithms exist for finding its global minima. Unfortunately, neither of the above two properties hold for $0 \leq p . However, they are still appealing because of the following folklores in the high-dimensional statistics. First, $\hat x(\gamma , p )$ is closer to $x_{o}$ than $\hat {x}(\gamma ,1)$ . Second, if we employ iterative methods that aim to converge to a local minima of $ {\arg \min }_{x}~({1}/{2})\| {y - Ax} \|_{2}^{2} + \gamma {\| x \|_{p}^{p}}$ , then under good initialization, these algorithms converge to a solution that is still closer to $x_{o}$ than $\hat {x}(\gamma ,1)$ . In spite of the existence of plenty of empirical results that support these folklore theorems, the theoretical progress to establish them has been very limited. This paper aims to study the above-mentioned folklore theorems and establish their scope of validity. Starting with approximate message passing (AMP) algorithm as a heuristic method for solving $\ell _{p}$ -regularized least squares, we study the following questions. First, what is the impact of initialization on the performance of the algorithm? Second, when does the algorithm recover the sparse signal $x_{o}$ under a “good” initialization? Third, when does the algorithm converge to the sparse signal regardless of the initialization? Studying these questions will not only shed light on the second folklore theorem, but also lead us to the answer the first one, i.e., the performance of the global optima $\hat x(\gamma , p )$ . For that purpose, we employ the replica analysis 1 to show the connection between the solution of AMP and $\hat {x}(\gamma , p)$ in the asymptotic settings. This enables us to compare the accuracy of $\hat x(\gamma ,p )$ and $\hat x(\gamma ,1 )$ . In particular, we will present an accurate characterization of the phase transition and noise sensitivity of $\ell _{p}$ -regularized least squares for every $0 \leq p \leq 1$ . Our results in the noiseless setting confirm that $\ell _{p}$ -regularized least squares (if $\gamma $ is tuned optimally) exhibits the same phase transition for every $0 \leq p and this phase transition is much better than that of LASSO. Furthermore, we show that in the noisy setting, there is a major difference between the performance of $\ell _{p}$ -regularized least squares with different values of $p$ . For instance, we will show that for very small and very large measurement noises, $p = 0$ and $p = 1$ outperform the other values of $p$ , respectively. 1 Replica method is a widely accepted heuristic in statistical physics for analyzing large disordered systems.
Read full abstract