Existing crowd counting methods are mainly trained and tested in similar scenarios. When the testing and training scenarios of the model are different, the counting accuracy of these methods will sharply decrease, which seriously limits their practical application. To address this problem, we propose a multistage gated fusion network (MGFNet) for cross-scene crowd counting. MGFNet is primarily composed of dynamic gated convolution units (DGCU) and multilevel scale attention blocks (MSAB) modules. Specifically, DGCU uses a dynamic gating path to supplement detailed information to reduce the loss of crowd information and overestimation of background in different scenarios. MSAB calibrates crowd information at different scales and perspectives in different scenes by generating attention maps with discriminative information. In addition, we used a new global local consistency loss to optimize the model to adapt to changes in crowd density and distribution. Extensive experiments on four different types of scene counting benchmarks show that the proposed MGFNet achieves superior cross-scene counting performance.