BackgroundImage-based crop growth modeling can substantially contribute to precision agriculture by revealing spatial crop development over time, which allows an early and location-specific estimation of relevant future plant traits, such as leaf area or biomass. A prerequisite for realistic and sharp crop image generation is the integration of multiple growth-influencing conditions in a model, such as an image of an initial growth stage, the associated growth time, and further information about the field treatment. While image-based models provide more flexibility for crop growth modeling than process-based models, there is still a significant research gap in the comprehensive integration of various growth-influencing conditions. Further exploration and investigation are needed to address this gap.MethodsWe present a two-stage framework consisting first of an image generation model and second of a growth estimation model, independently trained. The image generation model is a conditional Wasserstein generative adversarial network (CWGAN). In the generator of this model, conditional batch normalization (CBN) is used to integrate conditions of different types along with the input image. This allows the model to generate time-varying artificial images dependent on multiple influencing factors. These images are used by the second part of the framework for plant phenotyping by deriving plant-specific traits and comparing them with those of non-artificial (real) reference images. In addition, image quality is evaluated using multi-scale structural similarity (MS-SSIM), learned perceptual image patch similarity (LPIPS), and Fréchet inception distance (FID). During inference, the framework allows image generation for any combination of conditions used in training; we call this generation data-driven crop growth simulation.ResultsExperiments are performed on three datasets of different complexity. These datasets include the laboratory plant Arabidopsis thaliana (Arabidopsis) and crops grown under real field conditions, namely cauliflower (GrowliFlower) and crop mixtures consisting of faba bean and spring wheat (MixedCrop). In all cases, the framework allows realistic, sharp image generations with a slight loss of quality from short-term to long-term predictions. For MixedCrop grown under varying treatments (different cultivars, sowing densities), the results show that adding these treatment information increases the generation quality and phenotyping accuracy measured by the estimated biomass. Simulations of varying growth-influencing conditions performed with the trained framework provide valuable insights into how such factors relate to crop appearances, which is particularly useful in complex, less explored crop mixture systems. Further results show that adding process-based simulated biomass as a condition increases the accuracy of the derived phenotypic traits from the predicted images. This demonstrates the potential of our framework to serve as an interface between a data-driven and a process-based crop growth model.ConclusionThe realistic generation and simulation of future plant appearances is adequately feasible by multi-conditional CWGAN. The presented framework complements process-based models and overcomes their limitations, such as the reliance on assumptions and the low exact field-localization specificity, by realistic visualizations of the spatial crop development that directly lead to a high explainability of the model predictions.