Video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos with pre-defined degradations, and few consider compression. However, the videos on the Internet are compressed to reduce the massive size of video data. Thus, they also contain visual noises and artifacts due to the lossy compression. In this paper, we propose a video super-resolution model specific for compressed videos to suppress artifacts and restore high-resolution contents. We propose a two-stage super-resolution network for compressed videos that contains three new modules: cleanup and local alignment in the first stage, 3D-convolution temporal–spatial attention in the second stage. Specifically, the cleanup module is designed in stage-1 to suppress the artifacts and noises of low-quality inputs. The subsequent local alignment is designed to aggregate adjacent frames to maintain high-frequency details. The 3D-convolution attention is used in stage-2 to refine the weights of the feature maps. In addition, we propose the new CrfVideos dataset to facilitate fair comparisons, containing HEVC/H.265 compressed videos of different compression levels. Extensive experiments substantiate that our method outperforms state-of-the-art methods on the HEVC/H.265 compressed benchmark with medium to high compression rates by 0.3–0.7 dB in terms of PSNR and exceeds state-of-the-art methods on benchmark Vid4 by 0.2–0.8 dB. The code will be released at https://github.com/cvygkhv/decompression.