We consider the problem of reconstructing an unknown function \(u\in L^2(D,\mu )\) from its evaluations at given sampling points \(x^1,\dots ,x^m\in D\), where \(D\subset {\mathbb {R}}^d\) is a general domain and \(\mu \) a probability measure. The approximation is picked from a linear space \(V_n\) of interest where \(n=\dim (V_n)\). Recent results (Cohen and Migliorati in SMAI J Comput Math 3:181–203, 2017, Doostan and Hampton in Comput Methods Appl Mech Eng 290:73–97, 2015, Jakeman et al. in Math Comput 86:1913–1947, 2017) have revealed that certain weighted least-squares methods achieve near best (or instance optimal) approximation with a sampling budget m that is proportional to n, up to a logarithmic factor \(\ln (2n/\varepsilon )\), where \(\varepsilon >0\) is a probability of failure. The sampling points should be picked at random according to a well-chosen probability measure \(\sigma \) whose density is given by the inverse Christoffel function that depends both on \(V_n\) and \(\mu \). While this approach is greatly facilitated when D and \(\mu \) have tensor product structure, it becomes problematic for domains D with arbitrary geometry since the optimal measure depends on an orthonormal basis of \(V_n\) in \(L^2(D,\mu )\) which is not explicitly given, even for simple polynomial spaces. Therefore, sampling according to this measure is not practically feasible. One computational solution recently proposed in Adcock and Huybrechs (Approximating smooth, multivariate functions on irregular domains, forum of mathematics, sigma, Cambridge University Press, Cambridge, 2020) relies on using the restrictions of an orthonormal basis of \(V_n\) defined on a simpler bounding domain and sampling according to the original probability measure \(\mu \), in turn giving up on the optimal sampling budget \(m\sim n\). In this paper, we discuss practical sampling strategies, which amounts to using a perturbed measure \(\widetilde{\sigma }\) that can be computed in an offline stage, not involving the measurement of u, as recently proposed in Adcock and Cardenas (SIAM J Math Data Sci 2:607–630, 2020) and Migliorati (IMA J Numer Anal, 2020. https://doi.org/10.1093/imanum/draa023). We show that near best approximation is attained by the resulting weighted least-squares method at near-optimal sampling budget and we discuss multilevel approaches that preserve optimality of the cumulated sampling budget when the spaces \(V_n\) are iteratively enriched. These strategies rely on the knowledge of a-priori upper bounds B(n) on the inverse Christoffel function for the space \(V_n\) and the domain D. We establish bounds of the form \(\mathcal O(n^r)\) for spaces \(V_n\) of multivariate algebraic polynomials of given total degree, and for general domains D. The exact growth rate r depends on the regularity of the domain, in particular \(r=2\) for domains with Lipschitz boundaries and \(r=\frac{d+1}{d}\) for smooth domains.