A simple yet expressive prediction model is an essential ingredient in model-based control and estimation. Models derived from fundamental physical principles may fail to capture the complexity of the actual system dynamics. A potential solution is the use of a physics-informed, or gray-box model that extends a physics-based model with a data-driven part. Learning the latter might be challenging, due to noisy measurements and lack of full state information. This work presents a method based on Moving Horizon Estimation (MHE) for simultaneous state estimation and training of a black-box submodel, such as a neural network. The method can be used in offline training or applied online for adaptation without any prior knowledge than the white-box submodel. We analyze the capabilities of the method in a two degree of freedom robotic manipulator case study, also showing how it can be used for online adaptation to cope with a time-varying model mismatch.