We present an efficient method for computing A-optimal experimental designs for infinite-dimensional Bayesian linear inverse problems governed by partial differential equations (PDEs). Specifically, we address the problem of optimizing the location of sensors (at which observational data are collected) to minimize the uncertainty in the parameters estimated by solving the inverse problem, where the uncertainty is expressed by the trace of the posterior covariance. Computing optimal experimental designs (OEDs) is particularly challenging for inverse problems governed by computationally expensive PDE models with infinite-dimensional (or, after discretization, high-dimensional) parameters. To alleviate the computational cost, we exploit the problem structure and build a low-rank approximation of the parameter-to-observable map, preconditioned with the square root of the prior covariance operator. The availability of this low-rank surrogate, relieves our method from expensive PDE solves when evaluating the optimal experimental design objective function, i.e., the trace of the posterior covariance, and its derivatives. Moreover, we employ a randomized trace estimator for efficient evaluation of the OED objective function. We control the sparsity of the sensor configuration by employing a sequence of penalty functions that successively approximate the $\ell_0$-``norm''; this results in binary designs that characterize optimal sensor locations. We present numerical results for inference of the initial condition from spatiotemporal observations in a time-dependent advection-diffusion problem in two and three space dimensions. We find that an optimal design can be computed at a cost, measured in number of forward PDE solves, that is independent of the parameter and sensor dimensions. Moreover, the numerical optimization problem for finding the optimal design can be solved in a number of interior-point quasi-Newton iterations that is insensitive to the parameter and sensor dimensions. We demonstrate numerically that $\ell_0$-sparsified experimental designs obtained via a continuation method outperform $\ell_1$-sparsified designs.