Abstract

With recent advance of computer vision techniques, an increasing amount of image and video content is consumed by machines. However, existing image and video compression schemes are mainly designed for human vision, which are not optimized concerning machine vision. In this paper, we propose a saliency guided learned image compression scheme for machines, where object detection is considered as an example task. To obtain salient regions for machine vision, a saliency map is obtained for each detected object using an existing black-box explanation of neural networks, and maps for multiple objects are merged sophistically into one. Based on a neural network-based image codec, a bitrate allocation scheme has been designed which prunes the latent representation of the image according to the saliency map. During the training of end-to-end image codec, both pixel fidelity and machine vision fidelity are used for performance evaluation, where the degradation in detection accuracy is measured without ground-truth annotation. Experimental results demonstrate that the proposed scheme can achieve up to 14.1% reduction in bitrate with the same detection accuracy compared with the baseline learned image codec.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call