Long-term farming system experiments with comprehensive soil, plant and meteorological data combined with well-validated dynamic farming system models are a powerful combination to evaluate the production and environmental implications of novel innovations in agriculture. A current deficiency is the lack of adequate comprehensive validations for long term crop sequences that capture the dynamics of soil and crop responses without resetting soil conditions, especially for soil mineral nitrogen (Min-N). This reduces confidence in the outcome of scenario analyses for proposed improvements in productivity and sustainability given the central role played by nitrogen (N) dynamics in both processes. We used data from a 30-year long-term experiment (LTE) to validate the APSIM model and achieved excellent predictions for soil and crop responses including the dynamics of Min-N without resetting. A critical step was to ensure that the parameters required for the pools of soil organic matter matched those measured at the site over the full rooting depth (1.6 m) rather than limiting measured values to surface layers and relying on default parameters in deeper layers. In subsequent scenario analyses of agronomic innovations including fallow weed control, earlier sowing and improved N fertiliser strategies, the validated model predicted potential increases in average annual productivity of 1.2 t ha-1 (30%), WUE of 2.0 kg ha-1 mm-1 (30%) and NUE of 13 kg kg-1 (21%) and reductions in average annual N leaching of 8 kg N ha-1 (-33%) and soil organic matter loss of 3.1 t ha-1 (31%) (0–10 cm) could be achieved with specific combinations of synergistic innovations previously investigated separately in shorter-term experiments. Our study represents a rare case of model validation capturing the dynamics of soil water, Min-N, biomass and grain yield in a long-term diverse crop sequence to provide confidence in scenario analyses of production and environmental consequences of agronomic innovations. LTEs provide a valuable resource for such validation, while conclusions drawn from simulation studies that lack comprehensive validation must be considered with caution.