Abstract

Large-scale neural networks have attracted much attention for surprising results in various cognitive tasks such as object detection and image classification. However, the large number of weight parameters in the complex networks can be problematic when the models are deployed to embedded systems. In addition, the problems are exacerbated in emerging neuromorphic computers, where each weight parameter is stored within a synapse, the primary computational resource of the bio-inspired computers. We describe an effective way of reducing the parameters by a recursive tensor factorization method. Applying the singular value decomposition in a recursive manner decomposes a tensor that represents the weight parameters. Then, the tensor is approximated by algorithms minimizing the approximation error and the number of parameters. This process factorizes a given network, yielding a deeper, less dense, and weight-shared network with good initial weights, which can be fine-tuned by gradient descent.

Highlights

  • Large neural networks such as convolutional neural networks have demonstrated state-of-the-art performance in a number of benchmarks in computer vision, automatic speech recognition, natural language processing, audio recognition, etc. [1,2,3,4]

  • We describe a general parameter reduction method using new tensor approximation methods based on divide-and-conquer [20]

  • This paper makes the following contributions: (1) For neuromorphic computers, we evaluated the methods for reducing the size of the models in terms of the parameter reduction

Read more

Summary

Introduction

Large neural networks such as convolutional neural networks have demonstrated state-of-the-art performance in a number of benchmarks in computer vision, automatic speech recognition, natural language processing, audio recognition, etc. [1,2,3,4]. While the enormous computing power available today, mainly driven by GPUs, makes us consider the evaluation easy, it comes with large energy consumption. Weight parameters in neural networks are heavily redundant [6], and exploiting the redundancy, computational cost, and space requirements can be minimized while maintaining the performance. To this end, several methods have been proposed very recently [7,8,9,10,11,12], and all of these methods assume that neural networks are executed in stored-program computers, including GPU-based machines. The traditional computers have several processing bottlenecks, such as limited memory-bandwidth and a limited number of processing elements, and the performance benefit (e.g., speed-up) by the parameter reduction is not as high as the reduction rate

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.