Abstract

Recent three-dimensional (3-D) neuromorphic processing-in-memory (PIM) architecture provides a promising hardware-based solution to speed up the processing of convolutional neural networks. However, the limited capacity of the global buffer in this architecture is unable to efficiently handle synchronization overhead. In this paper, we jointly optimize the allocation of computation and memory resources on the 3-D-stacked PIM architecture. The objective is to minimize schedule length by removing synchronization overhead. To guarantee the generation of a feasible task schedule, we theoretically obtain the upper bound to reschedule each computation task. The target problem is further formulated as a dynamic programming model to get an optimal solution. We evaluate our technique with a variety of realistic neural network applications running on deep learning frameworks Caffe and TensorFlow. The results show that the proposed technique can achieve a significant reduction in processing time and improve the utilization of processing cores compared to previous studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call