Abstract

Recent deep learning models succeed in achieving high accuracy and fast inference time, but they require high-performance computing resources because they have a large number of parameters. However, not all systems have high-performance hardware. Sometimes, a deep learning model needs to be run on edge devices such as IoT devices or smartphones. On edge devices, however, limited computing resources are available and the amount of computation must be reduced to launch the deep learning models. Pruning is one of the well-known approaches for deriving light-weight models by eliminating weights, channels or filters. In this work, we propose “zero-keep filter pruning” for energy-efficient deep neural networks. The proposed method maximizes the number of zero elements in filters by replacing small values with zero and pruning the filter that has the lowest number of zeros. In the conventional approach, the filters that have the highest number of zeros are generally pruned. As a result, through this zero-keep filter pruning, we can have the filters that have many zeros in a model. We compared the results of the proposed method with the random filter pruning and proved that our method shows better performance with many fewer non-zero elements with a marginal drop in accuracy. Finally, we discuss a possible multiplier architecture, zero-skip multiplier circuit, which skips the multiplications with zero to accelerate and reduce energy consumption.

Highlights

  • In addition to non-zero element rate (NZER), we conducted another metric, NZER_ORIG, which is defined by the ratio of “the number of non-zero weight elements” in a pruned model (PM) to “total number of weight elements” in a unpruned original model (UPOM) as given in Equation (3)

  • We propose a new filter pruning method, “zero-keep filter pruning”, for computationally efficient deep learning inference

  • The basic idea of the scheme comes from the research result that filter pruning schemes do not significantly affect accuracy performance when they are applied to prune filters

Read more

Summary

Introduction

There are famous image classification models such as ResNet [4] and VGGNet [5]. These latest models achieve high accuracy and fast inference time, they have a large number of parameters which consume large memory and computing resources [6,7]. Models with many more parameters work well, but not all systems have the latest hardware. There have been attempts to apply deep learning models to various low-performance edge devices such as IoT devices and smartphones. It is difficult to run a large deep learning model with millions of parameters smoothly on these edge devices. It is possible to run the model on the edge device

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call