Abstract

General Matrix Multiplication (GEMM) is the key operation in Deep Neural Networks (DNNs). While dense GEMM uses SIMD CPUs efficiently, sparse GEMM is much less efficient, especially at the modest levels of unstructured sparsity common in DNN inference/training. Thus, most DNNs use dense GEMM.In this paper, we propose SAVE, a novel vector engine for CPUs that efficiently skips ineffectual computation due to sparsity in dense DNN implementations. SAVE's hardware extensions to the vector pipeline are transparent to software. SAVE accelerates FP32 and mixed-precision kernels with unstructured sparsity from both weights and activations. Further, SAVE is not DNN-specific and can potentially speed-up any vector workload with sparsity. To evaluate SAVE, we use simulations of a 28-core machine and run VGG16, ResNet-50, and GNMT, with and without pruning. With realistic sparsity, SAVE accelerates inference by 1.37x-1.68x and end-to-end training by 1.28x-1.64x.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call