Abstract

The conventional mapping method between RRAM array and convolutional weights faces two key challenges: 1) nonoptimal energy efficiency and 2) RRAM's temporal variation. To address these challenges, we propose shift and duplicate kernel (SDK) convolutional weight mapping architecture. Each kernel is duplicated multiple times and rearranged on different bitlines in a shifted manner, enabling higher intralayer computational parallelism, and reducing the number of input data loading. Hence, this architecture reduces the computational latency and energy consumption in both forward and backward propagation phases. Furthermore, we have introduced a parallel-window size allocation algorithm and a kernel synchronization method. Our proposed parallel-window size allocation algorithm aims to balance the interlayer pipeline architecture, thus improving the overall energy efficiency and area efficiency. Our proposed kernel synchronization method uses an averaging method to suppress the effect of temporal variation during weight update, enhancing the system's robustness for training. From our experiment results, our proposed architecture achieves ~6.8× area efficiency and ~2.1× energy efficiency over the conventional interlayer pipeline architecture. Significant improvement in classification accuracy by 21.7% under a temporal variation of 1%-5% is achieved during on-chip training task on the Cifar-10 dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.