Simulation models of social processes may require data that are not readily available, have low accuracy, are incomplete or biased. The paper presents a formal process for collating, assessing, selecting, and using secondary data as part of creating, validating, and documenting an agent-based simulation model of a complex social process, in this case, asylum migration to Europe. The process starts by creating an inventory of data sources, and the associated metadata, followed by assessing different aspects of data quality according to pre-defined criteria. As a result, based on the typology of available data, we are able to produce a thematic map of the area under study, and assess the uncertainty of key data sources, at least qualitatively. We illustrate the process by looking at the data on Syrian migration to Europe in 2011–21. In parallel, successive stages of the development of a simulation model allow for identifying key types of information which are needed as input into empirically grounded modelling analysis. Juxtaposing the available evidence and model requirements allows for identifying knowledge gaps that need filling, preferably by collecting additional primary data, or, failing that, by carrying out a sensitivity analysis for the assumptions made. By doing so, we offer a way of formalising the data collection process in the context of model-building endeavours, while allowing the modelling to be predominantly question-driven rather than purely data-driven. The paper concludes with recommendations with respect to data and evidence, both for modellers, as well as model users in practice-oriented applications.
Read full abstract