We consider the problem of recovering a complex signal $ {x} \in \mathbb {C}^{n}$ from $m$ intensity measurements of the form $ | {a}_{i} ^{\mathrm{ H}} {x}|, ~1 \leq i \leq m$ , where $ {a}_{i} ^{\mathrm{ H}}$ is the $i$ th row of measurement matrix $ {A} \in \mathbb {C}^{m \times n}$ . Our main focus is on the case where the measurement vectors are unconstrained, and where $ {x}$ is exactly $K$ -sparse, or the so-called general compressive phase retrieval problem. We introduce PhaseCode , a novel family of fast and efficient algorithms that are based on a sparse-graph coding framework. We show that in the noiseless case, the PhaseCode algorithm can recover an arbitrarily-close-to-one fraction of the $K$ nonzero signal components using only slightly more than $4K$ measurements when the support of the signal is uniformly random, with the order-optimal time and memory complexity of $\Theta (K)$ . 1 It is known that the fundamental limit for the number of measurements in compressive phase retrieval problem is $4K - o(K)$ for the more difficult problem of recovering the signal exactly and with no assumptions on its support distribution. This shows that under mild relaxation of the conditions, our algorithm is the first constructive capacity-approaching compressive phase retrieval algorithm: in fact, our algorithm is also order-optimal in complexity and memory. Furthermore, we show that for any signal $ {x}$ , PhaseCode can recover a random $(1 - p)$ -fraction of the nonzero components of $ {x}$ with high probability, where $p$ can be made arbitrarily close to zero, with sample complexity $m = c(p)K$ , where $c(p)$ is a small constant depending on $p$ that can be precisely calculated, with optimal time and memory complexity. As a result, assuming that the nonzero components of $ {x}$ are lower bounded by $\Theta (1)$ and upper bounded by $\Theta (K^{\gamma })$ for some positive constant $\gamma , we are able to provide a strong $\ell _{1}$ guarantee for the estimated signal $\hat { {x}}$ as follows: $\| \hat { {x}} - {x}\|_{1} \leq p \| {x} \|_{1}(1 + o(1))$ , where $p$ can be made arbitrarily close to zero. As one instance, the PhaseCode algorithm can provably recover, with high probability, a random $1 - 10^{-7}$ fraction of the significant signal components, using at most $m = 14K$ measurements. Next, motivated by some important practical classes of optical systems, we consider a “Fourier-friendly” constrained measurement setting, and show that its performance matches that of the unconstrained setting, when the signal is sparse in the Fourier domain with uniform support. In the Fourier-friendly setting that we consider, the measurement matrix is constrained to be a cascade of Fourier matrices (corresponding to optical lenses) and diagonal matrices (corresponding to diffraction mask patterns). Finally, we tackle the compressive phase retrieval problem in the presence of noise, where measurements are in the form of $y_{i}= | {a}_{i} ^{\mathrm{ H}} {x}|^{2}+w_{i}$ , and $w_{i}$ is the additive noise to the $i$ th measurement. We assume that the signal is quantized, and each nonzero component can take $L_{m}$ possible magnitudes and $L_{p}$ possible phases. We consider the regime, where $K=\beta n^\delta $ , $\delta \in (0,1)$ . We use the same architecture of PhaseCode for the noiseless case, and robustify it using two schemes: the almost-linear scheme and the sublinear scheme. We prove that with high probability, the almost-linear scheme recovers $ {x}$ with sample complexity $\Theta (K \log (n))$ and computational complexity $\Theta (L_{m} L_{p} n \log (n))$ , and the sublinear scheme recovers $ {x}$ with sample complexity $\Theta (K\log ^{3}(n))$ and computational complexity $\Theta (L_{m} L_{p} K\log ^{3}(n))$ . Throughout, we provide extensive simulation results that validate the practical power of our proposed algorithms for the sparse unconstrained and Fourier-friendly measurement settings, for noiseless and noisy scenarios. 1 Here, we define the notation $ \mathcal {O}(\cdot )$ , $\Theta (\cdot )$ , and $\Omega (\cdot )$ . We have $f= \mathcal {O}(g)$ if and only if there exists a constant $C_{1}>0$ such that $ \left |{f/g}\right | ; $f=\Theta (g)$ if and only if there exist two constants $C_{1},C_{2}>0$ such that $C_{1} ; and $f=\Omega (g)$ if and only if there exists a constant $C_{1}>0$ such that $ \left |{f/g}\right |>C_{1}$ .