Abstract
Quantization in lossy video compression may incur severe quality degradation, especially at low bit-rates. Developing post-processing methods that improve visual quality of decoded images is of great importance, as they can be directly incorporated in any existing compression standard or paradigm. We propose in this article a two-stage method, a texture detail restoration stage followed by a deep convolutional neural network (CNN) fusion stage, for video compression artifact reduction. The first stage performs in a patch-by-patch manner. For each patch in the current decoded frame, one prediction is formed based on the sparsity prior assuming that natural image patches can be represented by sparse activation of dictionary atoms. Under the temporal correlation hypothesis, we search the best matching patch in each reference frame, and select several matches with more texture details to tile motion compensated predictions. The second stage stacks the predictions obtained in the preceding stage along with the decoded frame itself to form a tensor, and proposes a deep CNN to learn the mapping between the tensor as input and the original uncompressed image as output. Experimental results demonstrate that the proposed two-stage method can remarkably improve, both subjectively and objectively, the quality of the compressed video sequence.
Highlights
Quantization in lossy image and video compression is a many-to-one mapping
Methods in the second class are restoration oriented. They regard the image compression as distortion, and the restoration from a decoded image is usually formulated as an ill-posed image inverse problem which is typically solved by exploiting some image model priors, including projections onto convex sets [3], [4], block-based sparse representation [5]–[11], total variation [12], and Markov random field [13], [14]
For learning the dictionaries to be used for sparse reconstruction and training the deep convolutional neural network (CNN) in our method, we have built a large-scale dataset which was derived from standard test sequences recommended by JCT-VC with four resolutions (Class B, C, D, and E)
Summary
Quantization in lossy image and video compression is a many-to-one mapping This means that the decoded block can be quite different from the original one, especially at low bit-rates. Developing post-processing methods that improve visual quality of decoded images at decoder sides has attracted great interest of researchers, as they can be directly incorporated in any existing compression standard or paradigm. Most existing such methods can be classified into three categories. In using deep neural networks one needs to train and learn a mapping function that estimates for a given decoded input its corresponding original
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have