In this paper we consider a measure-theoretical formulation of the training of NeurODEs in the form of a mean-field optimal control with L2-regularization of the control. We derive first order optimality conditions for the NeurODE training problem in the form of a mean-field maximum principle, and show that it admits a unique control solution, which is Lipschitz continuous in time. As a consequence of this uniqueness property, the mean-field maximum principle also provides a strong quantitative generalization error for finite sample approximations, yielding a rigorous justification of a phenomenon that we call coupled descent, indicating the simultaneous decrease of generalization and training errors. We consider two approaches to the derivation of the mean-field maximum principle, including one that is based on a generalized Lagrange multiplier theorem on convex sets of spaces of measures, which is arguably much simpler than those currently available in the literature for mean-field optimal control problems. The latter is also new, and can be considered as a result of independent interest.