The seminal work of Cohen and Peng [10] (STOC 2015) introduced Lewis weight sampling to the theoretical computer science community, which yields fast row sampling algorithms for approximating \(d\) -dimensional subspaces of \(\ell_{p}\) up to \((1+ \varepsilon)\) relative error. Prior works have extended this important primitive to other settings, such as the online coreset and sliding window models [4] (FOCS 2020). However, these results are only for \(p\in\{1,2\}\) , and results for \(p=1\) require a suboptimal \(\tilde{O}(d^{2}/\varepsilon^{2})\) samples. In this work, we design the first nearly optimal \(\ell_{p}\) subspace embeddings for all \(p\in(0,\infty)\) in the online coreset and sliding window models. In both models, our algorithms store \(\tilde{O}(d/\varepsilon^{2})\) rows for \(p\in(0,2)\) and \(\tilde{O}(d^{p/2}/\varepsilon^{2})\) rows for \(p\in(2,\infty)\) . This answers a substantial generalization of the main open question of [4], gives the first results for all \(p\notin\{1,2\}\) , and achieves nearly optimal sample complexities for all \(p\) . Towards our result, we give the first analysis of “one-shot” Lewis weight sampling of sampling rows proportionally to their Lewis weights, which gives a sample complexity of \(\tilde{O}(d^{p/2}/\varepsilon^{2})\) rows for \(p>2\) . Previously, such a sampling scheme was only known to have a sample complexity of \(\tilde{O}(d^{p/2}/\varepsilon^{5})\) [10], whereas a bound of \(\tilde{O}(d^{p/2}/\varepsilon^{2})\) is known if a more sophisticated recursive sampling algorithm is used [20, 32]. Note that the recursive sampling strategy cannot be implemented in an online setting, thus necessitating an analysis of one-shot Lewis weight sampling. Perhaps surprisingly, our analysis crucially uses a novel connection to online numerical linear algebra, even for offline Lewis weight sampling . As an application, we obtain the first online coreset algorithms for \((1+\varepsilon)\) approximation of important generalized linear models, such as logistic regression and \(p\) -probit regression. Our upper bounds are parameterized by a complexity parameter \(\mu\) introduced by [31], and we also provide the first lower bounds showing that a linear dependence on \(\mu\) is necessary.
Read full abstract