Electron spin resonance (ESR) spectroscopy in scanning tunneling microscopy (STM) has enabled probing the electronic structure of single magnetic atoms and molecules on surfaces with unprecedented energy resolution, as well as demonstrating coherent manipulation of single spins. Despite this remarkable success, the field could still be greatly advanced by a more quantitative understanding of the ESR-STM physical mechanisms. Here, we present a theory of ESR-STM that quantitatively models not only the ESR signal itself, but also the full background tunneling current, from which the ESR signal is derived. Our theory is based on a combination of Green's function techniques to describe the electron tunneling and a quantum master equation for the dynamics of the spin system along with microwave radiation interacting with both the tunneling current and the magnetic system. We show that this theory is able to quantitatively reproduce the experimental results for a spin-1/2 system (TiH molecules on MgO) across many orders of magnitude in tunneling current, providing access to the relaxation and decoherence rates that govern the spin dynamics due to intrinsic mechanisms and to the applied bias voltage. More importantly, our work establishes that (i) sizable ESR signals, which are a measure of microwave-induced changes in the junction magnetoresistance, require surprisingly high tip spin polarizations, and (ii) the coupling of the magnetization dynamics to the microwave field gives rise to the asymmetric ESR spectra often observed in this spectroscopy. Additionally, our theory provides very specific predictions for the dependence of the relaxation and decoherence times on the bias voltage and the tip-sample distance. Finally, with the help of electromagnetic simulations, we find that the transitions in our ESR-STM experiments, in which the tunnel junction is irradiated by a nearby microwave antenna, can be driven by the ac magnetic field at the junction. Published by the American Physical Society 2024