Abstract
Neural networks (NNs) have been widely adopted in various application domains, ranging from image and video recognition to natural language processing. Recent studies reveal that deeper NNs with more parameters greatly enhance the output accuracy. However, complex NNs incur intensive memory accesses. Since the weights of even a single layer can exceed the on-chip storage capacity, the data usually need to be partitioned. Compression can effectively reduce the storage space requirements. However, there is no research considering the partition of the spare matrix. In this paper, we propose a sparse NN data partition and loop scheduling scheme. We establish the compression efficiency model of the matrix sparse algorithm and design a partition selection method based on sparsity characteristics analyzed by the compression efficiency model. Finally, we design a loop scheduling scheme based on the proper partition size. The experiment results show that the average memory access of each layer can be compressed to 68% of the original; additionally, the throughput of the three networks is increased to an average of 1.66 times.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.