Learning-based lossy image compression usually involves the joint optimization of rate-distortion performance, and requires to cope with the spatial variation of image content and contextual dependence among learned codes. Traditional entropy models can spatially adapt the local bit rate based on the image content, but usually are limited in exploiting context in code space. On the other hand, most deep context models are computationally very expensive and cannot efficiently perform decoding over the symbols in parallel. In this paper, we present a content-weighted encoder-decoder model, where the channel-wise multi-valued quantization is deployed for the discretization of the encoder features, and an importance map subnet is introduced to generate the importance masks for spatially varying code pruning. Consequently, the summation of importance masks can serve as an upper bound of the length of bitstream. Furthermore, the quantized representations of the learned code and importance map are still spatially dependent, which can be losslessly compressed using arithmetic coding. To compress the codes effectively and efficiently, we propose an upper-triangular masked convolutional network (triuMCN) for large context modeling. Experiments show that the proposed method can produce visually much better results, and performs favorably against deep and traditional lossy image compression approaches.
Read full abstract