Image compression aims to minimize the amount of data in image representation while maintaining a certain visual quality for humans, which is an essential technique for storage and transmission. Recently, along with the development of computer vision, machines have become another primary receiver for images and require compressed images at a certain quality level, which may be different from that of human vision. In many scenarios, compressed images should serve both human and machine vision tasks, but few compression methods are designed for both goals simultaneously. In this article, we propose a unified and scalable deep image compression (USDIC) framework that jointly optimizes the image quality according to human and machine vision in an end-to-end style. For the encoder, we propose an information splitting mechanism (ISM) to separate images into semantic and visual features, which mainly aims at machine analysis and human viewing tasks. For the decoder, we design a scalable decoding architecture. The encoded semantic feature is first decoded for machine analysis tasks, and the image is decoded and reconstructed further by leveraging the decoded semantic features. Herein, to further remove the redundancy between the semantic and visual features of images, we propose a scalable entropy model (SEM) with a joint optimization strategy to reconstruct the image using the two kinds of decoded features. Extensive experimental results show that the proposed USDIC achieves much better performance on the image analysis task while maintaining competitive performance on the traditional image reconstruction task compared with popular image compression methods.
Read full abstract