Abstract

Deep neural networks (DNN) have been widely applied in many computer vision problems. These tasks are often conducted on input images with high quality without consideration of storage and transmission costs, making it necessary to compress the images for bandwidth-constrained networks. Recently, researches on deep learning based image compression show promising performance compared with traditional image compression codecs. However, these image compression approaches focus on improving user-perceived visual quality, rather achieving high DNN inference accuracy for computer vision. In this work, we design a concrete system with the goal of maximizing the computer vision performance metric, subject to a compression ratio constraint. The entire framework is efficiently optimized in an end-to-end manner without multiple training phases. We find that the conventional distortion metric Mean Squared Error (MSE) in compression does not suffice to get desirable computer vision performance in our system. It is essential to exploit machine-centric evaluation metrics for high inference accuracy. We also propose to apply class-agnostic object masks combined with channel attention mechanism to dynamically allocate bits in regions of interest (ROI) and the background regions (BG) so as to optimize computer vision performance over a range of bitrates. We experiment on three diverse applications separately: image classification, human pose estimation and semantic segmentation. Extensive experiments show that our approach not only outperforms many traditional compression codecs in image compression, but also achieves superior computer vision performance than all other counterparts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call