The effects of sample size and sample prevalence on cellular automata simulation of urban growth

Bin Zhang,Chang Xia

doi:10.1080/13658816.2021.1931237

Abstract

ABSTRACT This study investigates the effects of sample size and sample prevalence on cellular automata (CA) simulation of urban growth. We take the CA models based on an artificial neural network (ANN), logistic regression (LR), and support vector machine (SVM) as examples, to simulate the urban growth of Wuhan city in China and the Wuhan Metropolitan Area under different sampling schemes. The results of the CA models based on the ANN, LR, and SVM methods are generally consistent. The sampling scheme with a small sample size and a low sample prevalence should be discarded because of the high uncertainty. Sample size determines the robustness of a CA model, whereas sample prevalence affects the performance of a CA model when there are sufficient samples. In particular, the closer the sample prevalence is to the population prevalence, the higher the simulation accuracy and the lower the shape complexity and fragmentation of the simulated urban patterns. We suggest that the optimal sampling scheme has a sample rate of 1% and a sample prevalence that is the same as the population prevalence. The selection of the optimal sampling scheme is independent of the population sizes represented by different study areas.

Full Text