Abstract
Recently, many deep learning based researches are conducted to explore the potential quality improvement of compressed videos. These methods mostly utilize either the spatial or temporal information to perform frame-level video enhancement. However, they fail in combining different spatial-temporal information to adaptively utilize adjacent patches to enhance the current patch and achieve limited enhancement performance especially on scene-changing and strong-motion videos. To overcome these limitations, we propose a patch-wise spatial-temporal quality enhancement network which firstly extracts spatial and temporal features, then recalibrates and fuses the obtained spatial and temporal features. Specifically, we design a temporal and spatial-wise attention-based feature distillation structure to adaptively utilize the adjacent patches for distilling patch-wise temporal features. For adaptively enhancing different patch with spatial and temporal information, a channel and spatial-wise attention fusion block is proposed to achieve patch-wise recalibration and fusion of spatial and temporal features. Experimental results demonstrate our network achieves peak signal-to-noise ratio improvement, 0.55 - 0.69 dB compared with the compressed videos at different quantization parameters, outperforming state-of-the-art approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.