Abstract

The conventional mapping method between RRAM array and convolutional weights faces two key challenges: 1) nonoptimal energy efficiency and 2) RRAM's temporal variation. To address these challenges, we propose shift and duplicate kernel (SDK) convolutional weight mapping architecture. Each kernel is duplicated multiple times and rearranged on different bitlines in a shifted manner, enabling higher intralayer computational parallelism, and reducing the number of input data loading. Hence, this architecture reduces the computational latency and energy consumption in both forward and backward propagation phases. Furthermore, we have introduced a parallel-window size allocation algorithm and a kernel synchronization method. Our proposed parallel-window size allocation algorithm aims to balance the interlayer pipeline architecture, thus improving the overall energy efficiency and area efficiency. Our proposed kernel synchronization method uses an averaging method to suppress the effect of temporal variation during weight update, enhancing the system's robustness for training. From our experiment results, our proposed architecture achieves ~6.8× area efficiency and ~2.1× energy efficiency over the conventional interlayer pipeline architecture. Significant improvement in classification accuracy by 21.7% under a temporal variation of 1%-5% is achieved during on-chip training task on the Cifar-10 dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call