We propose a spatiotemporal coupling deep neural network approach for time-resolved reconstruction of the velocity field around a circular cylinder. The neural network leverages two distinct data types: (1) non-time-resolved velocity field around the cylinder, consisting of fixed frequency sampling and variable frequency sampling velocity field, and (2) the time-resolved surface pressure sequence around the cylinder. The deep neural network comprises two sub-networks: a convolutional autoencoder (CAE) for nonlinear mode extraction and a Transformer for sequence-to-sequence learning. We refer to this architecture as CTNet (CAE-Transformer Network). The encoder in the CAE maps non-time-resolved velocity field to a latent vector, enabling the extraction of nonlinear modal coefficients. An appropriate time window length for the surface pressure sequence is then selected to establish a Transformer sequence learning model, using the chosen sequence as input to predict the corresponding nonlinear modal coefficients. Once the Transformer is well trained, the time-resolved nonlinear modal coefficients of velocity field can be achieved. Along with the well-trained decoder in the CAE, the time-resolved velocity field can be reconstructed from the output of the Transformer. We verify the performance of CTNet by a simulated dataset at a representative Reynolds number of 3900. The results show a relative reconstruction error of just 6.3% for the time-resolved velocity field, demonstrating high reliability in the reconstruction. We further compare the reconstructed velocity field obtained with and without the utilization of variable frequency sampling velocity field. Notably, the inclusion of variable frequency sampling velocity field significantly improves the reconstruction quality.