Abstract
Network sparsification serves as an effective technique to accelerate Deep Neural Network (DNN) inference. However, existing sparsification techniques often rely on structured sparsity, which yields limited benefits. This is primarily due to the significant memory and computational overhead introduced by numerous sparse storage formats during address generation and gradient updates. Additionally, many of these solutions are tailored solely for the inference phase, neglecting the crucial training phase. In this paper, we introduce STCO, a novel Sparse Tensor Compilation Optimization technique that significantly enhances training efficiency through structured sparse tensor compilation. Central to STCO is the Tensorization-aware Index Entity (TIE) format, which effectively represents structured sparse tensors by eliminating redundant indices and minimizing storage overhead. The TIE format plays a pivotal role in the Address-carry flow (AC flow) pass, which optimizes the data layout at the computational graph level. This pass leverages the TIE format to enhance the efficiency of tensor representations, enabling more compact and efficient sparse tensor storage. Meanwhile, a shape inference pass utilizes the AC flow to derive optimized tensor shapes, further refining the performance of sparse tensor operations. Moreover, the Address-Carry TIE Flow dynamically tracks nonzero addresses, extending the benefits of sparse optimization to both forward and backward propagation. This seamless integration into the training pipeline enables a smooth transition to sparse tensor compilation without significant modifications to existing codebases. To further boost training performance, we implement an operator-level AC flow optimization pass tailored for structured sparse tensors. This pass generates efficient addresses, ensuring minimal computational overhead during sparse tensor operations. The flexibility of STCO allows it to be efficiently integrated into various frameworks or compilers, providing a robust solution for enhancing training efficiency with structured sparse tensors. Experiments demonstrated that STCO achieved impressive speedups of 3.64 ×, 5.43 ×, 4.89 ×, and 3.91 × when compared to state-of-the-art sparse formats on VGG16, ResNet-18, MobileNetV1, and MobileNetV2, respectively. These findings underscore the efficiency and superiority of our proposed approach in leveraging unstructured sparsity for Deep Neural Network inference acceleration.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Design Automation of Electronic Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.