Abstract
Recurrent Neural Networks (RNNs) spend most of their execution time performing matrix-vector multiplication (MV-mul). Because the matrices in RNNs have poor reusability and the ever-increasing size of the matrices becomes too large to fit in the on-chip storage of mobile/IoT devices, the performance and energy efficiency of MV-mul is determined by those of main-memory DRAM. Therefore, computing MV-mul within DRAM draws much attention. However, previous studies lacked consideration for the matrix sparsity, the power constraints of DRAM devices, and concurrency in accessing DRAM from processors while performing MV-mul. We propose a main-memory architecture called MViD, which performs MV-mul by placing MAC units inside DRAM banks. For higher computational efficiency, we use a sparse matrix format and exploit quantization. Because of the limited power budget for DRAM devices, we implement the MAC units only on a portion of the DRAM banks. We architect MViD to slow down or pause MV-mul for concurrently processing memory requests from processors while satisfying the limited power budget. Our results show that MViD provides 7.2× higher throughput compared to the baseline system with four DRAM ranks (performing MV-mul in a chip-multiprocessor) while running inference of Deep Speech 2 with a memory-intensive workload.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.