Petrov–Galerkin formulations with optimal test functions allow for the stabilization of finite element simulations. In particular, given a discrete trial space, the optimal test space induces a numerical scheme delivering the best approximation in terms of a problem-dependent energy norm. This ideal approach has two shortcomings: first, we need to explicitly know the set of optimal test functions; and second, the optimal test functions may have large supports inducing expensive dense linear systems. A concise proposal on how to overcome these shortcomings has been raised during the last decade by the Discontinuous Petrov–Galerkin (DPG) methodology. However, DPG has also some limitations and difficulties: the method requires ultraweak variational formulations, obtained through a hybridization process, which is not trivial to implement at the discrete level.Our motivation is to offer a simpler alternative for the case of parametric PDEs, which can be used with any variational formulation. Indeed, parametric families of PDEs are an example where it is worth investing some (offline) computational effort to obtain stabilized linear systems that can be solved efficiently in an online stage, for a given range of parameters. Therefore, as a remedy for the first shortcoming, we explicitly compute (offline) a function mapping any PDE parameter, to the matrix of coefficients of optimal test functions (in some basis expansion) associated with that PDE parameter. Next, as a remedy for the second shortcoming, we use the low-rank approximation to hierarchically compress the (non-square) matrix of coefficients of optimal test functions. In order to accelerate this process, we train a neural network to learn a critical bottleneck of the compression algorithm (for a given set of PDE parameters). When solving online the resulting (compressed) Petrov–Galerkin formulation, we employ a GMRES iterative solver with inexpensive matrix–vector multiplications thanks to the low-rank features of the compressed matrix. We perform experiments showing that the full online procedure is as fast as an (unstable) Galerkin approach. We illustrate our findings by means of 2D–3D Eriksson–Johnson problems, together with 2D Helmholtz equation.