Blind video watermarking (BVW) is the process of embedding visually imperceptible messages into a cover video, which can be retrieved even in the presence of distortion without requiring any reference to the original video. Compared to image watermarking, BVW encounters two critical challenges: 1) ensuring the visually imperceptible of the embedded information across sequential frames; 2) enabling quick localization of the watermarked frames in the event of temporal and spatial attacks such as frame dropping and video compression. This paper presents a robust blind video watermarking network by exploring a block-based selection mechanism in the frequency domain. The network comprises an encoding part that hides the watermark in a suitable location within the video to ensure visual imperceptibility of the watermark, and a decoding part that tracks the watermark in the frequency domain to ensure its robustness even when the video is distorted. Moreover, a plug-and-play watermark detector is designed to discover watermarking locations. This detector is readily applicable and can be easily integrated into most frequency watermarking embedding methods. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods on imperceptibility and robustness significantly. Our framework achieves an average peak signal-to-noise ratio (PSNR) of 37.59 dB and a Learned Perceptual Image Patch Similarity (LPIPS) of 1.12×10−2. Furthermore, the extraction accuracy reaches 99.99% under a variety of temporal attacks which demonstrates the robustness of the proposed method.