Neural networks (NNs) have been widely adopted in various application domains, ranging from image and video recognition to natural language processing. Recent studies reveal that deeper NNs with more parameters greatly enhance the output accuracy. However, complex NNs incur intensive memory accesses. Since the weights of even a single layer can exceed the on-chip storage capacity, the data usually need to be partitioned. Compression can effectively reduce the storage space requirements. However, there is no research considering the partition of the spare matrix. In this paper, we propose a sparse NN data partition and loop scheduling scheme. We establish the compression efficiency model of the matrix sparse algorithm and design a partition selection method based on sparsity characteristics analyzed by the compression efficiency model. Finally, we design a loop scheduling scheme based on the proper partition size. The experiment results show that the average memory access of each layer can be compressed to 68% of the original; additionally, the throughput of the three networks is increased to an average of 1.66 times.
Read full abstract