Abstract

We demonstrate the design of efficient and high-performance artificial intelligence (AI)/deep learning accelerators with customized spin transfer torque (STT)-MRAM (STT-MRAM) and a reconfigurable core. Based on model-driven detailed design space exploration, we present the design methodology of an innovative scratchpad-assisted on-chip STT-MRAM-based buffer system for high-performance accelerators. Using analytically derived expression of memory occupancy time of AI model weights and activation maps, the volatility of STT-MRAM is adjusted with process and temperature variation aware scaling of thermal stability factor to optimize the retention time, energy, read/write latency, and area of STT-MRAM. From the analysis of AI workloads and accelerator implementation in 14-nm technology, we verify the efficacy of our AI accelerator with STT-MRAM (STT-AI). Compared to an SRAM-based implementation, the STT-AI accelerator achieves 75% area and 3% power savings at isoaccuracy. Furthermore, with a relaxed bit error rate and negligible AI accuracy tradeoff, the designed STT-AI Ultra accelerator achieves 75.4% and 3.5% savings in area and power, respectively, over regular SRAM-based accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call