Abstract

Canonical Polyadic Decomposition (CPD) is one of the most popular methods in tensor decomposition and plays an important role in big data analysis. For sparse tensor, the major computation procedure in CPD, known as matricized tensor times Khatri-Rao product (MTTKRP), exhibits discontiguous memory access and becomes the performance bottleneck from achieving high performance on emerging processor architectures. In this paper, we propose swCPD, an efficient CPD implementation on the many-core Sunway architecture. The main idea adopted in swCPD is a hierarchical partitioning mechanism. From the computation perspective, the 64 CPEs are divided into eight groups, each group contains seven workers and one controller. From the data perspective, we partition the sparse tensor into different granularities including blocks, bands and tiles. Moreover, we develop a communication mechanism through register communication for cooperation between CPEs. We evaluate our implementation with both synthesized and real-world datasets, which achieves better performance than two cutting-edge CPD implementations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.