Abstract

Intra/inter switching-based error resilient video coding effectively enhances the robustness of video streaming when transmitting over error-prone networks. But it has a high computation complexity, due to the detailed end-to-end distortion prediction and brute-force search for rate-distortion optimization. In this article, a Low Complexity Mode Switching based Error Resilient Encoding (LC-MSERE) method is proposed to reduce the complexity of the encoder through a deep learning approach. By designing and training multi-scale information fusion-based convolutional neural networks (CNN), intra and inter mode coding unit (CU) partitions can be predicted by the networks rapidly and accurately, instead of using brute-force search and a large number of end-to-end distortion estimations. In the intra CU partition prediction, we propose a spatial multi-scale information fusion based CNN (SMIF-Intra). In this network a shortcut convolution architecture is designed to learn the multi-scale and multi-grained image information, which is correlated with the CU partition. In the inter CU partition, we propose a spatial-temporal multi-scale information fusion-based CNN (STMIF-Inter), in which a two-stream convolution architecture is designed to learn the spatial-temporal image texture and the distortion propagation among frames. With information from the image, and coding and transmission parameters, the networks are able to accurately predict CU partitions for both intra and inter coding tree units (CTUs). Experiments show that our approach significantly reduces computation time for error resilient video encoding with acceptable quality decrement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call