Abstract
We propose a novel architecture to efficiently perform sparse tensor decomposition/completion. As the generalization of vectors and matrices, tensors are widely used to process high-dimensional data. Sparse tensor decomposition (SpTD) is not only an emerging tensor analysis technique but also an effective tool to reduce the storage and computation costs of tensors. However, conventional general-purpose processors are inefficient to perform SpTD, mainly due to: i) variable sparsity degree and flexible buffer size requirement; ii) difficulties of fusing multiple execution kernels to pursue better performance. For domain-specific accelerator designers on the other hand, the diversity of decomposition algorithms is also an important problem that must be considered. To solve these challenges, we propose a unified abstraction for SpTD algorithms and design a specialized accelerator. First, we formulate two types of core kernels (SpLrMM and LrSampling) that serve as a standard form to fit a broad range of SpTD algorithms. Second, we design a sparse tensor engine (STE) to efficiently perform SpTD. STE uses a processing element (PE)-interactive architecture where PEs can be flexibly grouped together via Network-on-Chip (NoC) to share the buffer capacity, bandwidth, and compute resources. We evaluate our accelerator with extensive experiments, and it can achieve an average speedup of 45× over CPU and 29× over GPU.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.