Abstract

Although deep convolutional neural networks (CNNs) have achieved great success in computer vision tasks, its real-world application is still impeded by its voracious demand of computational resources. Current works mostly seek to compress the network by reducing its parameters or parameter-incurred computation, neglecting the influence of the input image on the system complexity. Based on the fact that input images of a CNN contain substantial redundancy, in this paper, we propose a unified framework, dubbed as ThumbNet, to simultaneously accelerate and compress CNN models by enabling them to infer on one thumbnail image. We provide three effective strategies to train ThumbNet. In doing so, ThumbNet learns an inference network that performs equally well on small images as the original-input network on large images. With ThumbNet, not only do we obtain the thumbnail-input inference network that can drastically reduce computation and memory requirements, but also we obtain an image downscaler that can generate thumbnail images for generic classification tasks. Extensive experiments show the effectiveness of ThumbNet, and demonstrate that the thumbnail-input inference network learned by ThumbNet can adequately retain the accuracy of the original-input network even when the input images are downscaled 16 times.

Highlights

  • Recent years have witnessed the growing performance of deep convolutional neural networks (CNNs) [13, 35, 38, 39, 47, 50, 51], and their expanding computation and memory costs [3]

  • In order to give a clearer sense of the rationale, we re-illustrate the left segment of ThumbNet along with the FM loss from a different point of view in Fig. 3

  • We propose a unified framework ThumbNet to tackle the problem of accelerating run-time deep convolutional network from a novel perspective: downscaling the input image

Read more

Summary

INTRODUCTION

Recent years have witnessed the growing performance of deep convolutional neural networks (CNNs) [13, 35, 38, 39, 47, 50, 51], and their expanding computation and memory costs [3]. We propose to use a thumbnail image, i.e., an image of lower spatial resolution than its original-size counterpart, as test-time network input to accelerate and compress CNNs of any architecture and of any depth and width. (1) We propose an orthogonal mechanism to accelerate a deep network compared to conventional methods: from the novel perspective of enabling the network to infer on one single downscaled image efficiently and effectively To this end, we propose a unified framework called ThumbNet to train a thumbnailinput network that can tremendously reduce computation and memory consumption while maintaining the accuracy of the originalinput network. The ThumbNet generated images can replace their original-size counterparts and be stored for other classification-related tasks, reducing resource requirements in the long run. (3) The proposed ThumbNet effectively preserves network accuracy at speedup ratios of up to 4× (Imagenet) and 16× (Places) on various networks, surpassing other network acceleration/compression methods by significant margins

Knowledge Distillation
Auto-Encoder
Downscaled Image Representation
Network Architecture
Details of Network Design
4: Output
Training Details
EXPERIMENTS
Ablation Study
Methods
Comparison to State-of-the-Art Methods
Evaluation of the Supervised Downscaler
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call