Abstract

During the past few years, face videos, e.g., video conference, interviews and variety shows, have grown explosively with millions of users over social media networks. Unfortunately, the existing compression algorithms are applied to these videos for reducing bandwidth, which also bring annoying artifacts to face regions. This paper addresses the problem of face quality enhancement in compressed videos by reducing the artifacts of face regions. Specifically, we establish a compressed face video (CFV) database, which includes 196,337 faces in 214 high-quality video sequences and their corresponding 1,712 compressed sequences. We find that the faces of compressed videos exhibit tremendous scale variation and quality fluctuation. Motivated by scalable video coding, we propose a multi-scale recurrent scalable network (MRS-Net+) to enhance the quality of multi-scale faces in compressed videos. The MRS-Net+ is comprised by one base and two refined enhancement levels, corresponding to the quality enhancement of small-, medium- and large-scale faces, respectively. In the multi-level architecture of our MRS-Net+, small-/medium-scale face quality enhancement serves as the basis for facilitating the quality enhancement of medium-/large-scale faces. We further develop a landmark-assisted pyramid alignment (LPA) subnet to align faces across consecutive frames, and then apply the mask-guided quality enhancement (QE) subnet for enhancing multi-scale faces. Finally, experimental results show that our MRS-Net+ method achieves averagely 1.196 dB improvement of peak signal-to-noise ratio (PSNR) and 23.54% saving of Bjontegaard distortion-rate (BD-rate), significantly outperforming other state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call