Forest gap models have been applied widely to examine forest development under natural conditions and to investigate the effect of climate change on forest succession. Due to the complexity and parameter requirements of such models a rigorous evaluation is required to build confidence in the simulation results. However, appropriate data for model assessment are scarce at the large spatial and temporal scales of successional dynamics. In this study, we explore a data source for the evaluation of forest gap models that has been used only little in the past, i.e., large-scale National Forest Inventory data. The key objectives of this study were (a) to examine the potentials and limitations of using large-scale forest inventory data for evaluating the performance of forest gap models and (b) to test two particular models as case studies to derive recommendations for their future improvement. We used data from the first Swiss National Forest Inventory to examine the species basal area and tree numbers in different diameter classes simulated by the gap models F orC lim (version 2.9.3) and PICUS (version 1.4) for forest types that are typical of mountain forests in Switzerland. The results showed the potential of data from large-scale forest inventories for evaluating model performance. Since this type of data is typically based on a large number of samples across environmental gradients, they are particularly suited for investigations at the general level of the dominant species based on stand basal area. However, the surprisingly small variability of juvenile trees (trees <12 cm diameter at breast height; dbh) indicated limitations of the data used. Insufficient representativeness due to small sample plot size and uncertainty regarding past management limit an evaluation of structural forest aspects such as species diversity, and number of small trees (dbh < 12 cm). The examined models reproduced the observed species composition satisfactorily. However, there were clear model deficiencies in the simulation of successional patterns and of juvenile tree numbers. We identified priorities for future model development. We conclude that large-scale forest inventory data can be valuable for model evaluation, particularly when they cover large environmental gradients and do not come from intensively managed forests. Due to their limitations, they must, however, be complemented by other data such as from a full cruise.