This paper presents a detailed comparison between 3 methods for emulating CPU-intensive reactive transport models (RTMs): Gaussian processes (GPs), polynomial chaos expansion (PCE), and deep neural networks (DNNs). State-of-the-art open source libraries are used for each emulation method while the CPU-time incurred by one forward run of the considered RTMs varies from 1 h to between 1 h and 30 min and 5 days. Besides direct emulation of the simulated uranium concentration time series, replacing the original RTM by its emulator is also investigated for global sensitivity analysis (GSA), uncertainty propagation, and probabilistic calibration using Markov chain Monte Carlo (MCMC) sampling. The selected DNN is found to be superior to both GPs and PCE in reproducing the input–output behavior of the considered 8-dimensional and 13-dimensional CPU-intensive RTMs. This even though the used training sets are small, from 75 to 500 samples. Furthermore, the two used PCE variants, standard PCE and sparse PCE (sPCE), appear to always provide the least accuracy while not differing much in performance. As a consequence of its better emulation capabilities, the DNN method outperforms the two other methods for uncertainty propagation. For the GSA application, the DNN and GP methods offer equally good approximations to the true first-order and total-order Sobol’ sensitivity indices while PCE does less well. Most surprisingly, despite its superior emulation skills, the DNN approach leads to the worst solution of the considered synthetic inverse problem which involves 1224 measurement data with low noise. This apparently contradicting behavior of the used DNN is at least partially due to the small but complicated deterministic noise that affects the DNN-based predictions. Indeed, this complex error structure can drive the emulated solutions far away from the true posterior distribution in case of high-quality measurement data. Among the considered 3 methods, only the GP method allows for retrieving emulated posterior solutions that jointly (1) fit the high-quality measurement data to the appropriate noise level (log-likelihood value) and (2) most closely fit the true model parameter values. Overall, our findings indicate that when the available training set is relatively small (75 to 500 input-output examples) and fixed beforehand, PCE is not the best choice for emulating CPU-intensive RTMs. Instead, GPs or DNNs should be preferred. However, a DNN can deliver overly biased model calibration results. In contrast, the GP method performs fairly well across all considered tasks: direct emulation, global sensitivity analysis, uncertainty propagation, and calibration.