There is a critical need for efficient and reliable active flow control strategies to reduce drag and noise in aerospace and marine engineering applications. While traditional full-order models based on the Navier–Stokes equations are not feasible, advanced model reduction techniques can be inefficient for active control tasks, especially with strong non-linearity and convection-dominated phenomena. Using convolutional recurrent autoencoder network architectures, deep-learning-based reduced-order models have been recently shown to be effective while performing several orders of magnitude faster than full-order simulations. However, these models encounter significant challenges outside the training data, limiting their effectiveness for active control and optimization tasks. In this study, we aim to improve the extrapolation capability by modifying the network architecture and integrating coupled space–time physics as an implicit bias. Reduced-order models via deep learning generally employ decoupling in spatial and temporal dimensions, which can introduce modeling and approximation errors. To alleviate these errors, we propose a novel technique for learning coupled spatial–temporal correlation using a three-dimensional convolution network. We assess the proposed technique against a standard encoder–propagator–decoder model and demonstrate a superior extrapolation performance. To demonstrate the effectiveness of the three-dimensional convolution network, we consider a benchmark problem of the flow past a circular cylinder at laminar flow conditions and use the spatiotemporal snapshots from the full-order simulations. Our proposed three-dimensional convolution architecture accurately captures the velocity and pressure fields for varying Reynolds numbers. Compared to the standard encoder–propagator–decoder network, the spatiotemporal-based three-dimensional convolution network improves the prediction range of Reynolds numbers outside of the training data.