Multi-dimensional, large-scale, and sparse data, which can be neatly represented by sparse tensors, are increasingly used in various applications such as data analysis and machine learning. A high-performance sparse tensor-vector product (SpTV), one of the most fundamental operations of processing sparse tensors, is necessary for improving efficiency of related applications. In this article, we propose aeSpTV, an adaptive and efficient SpTV framework on Sunway TaihuLight supercomputer, to solve several challenges of optimizing SpTVon high-performance computing platforms. First, to map SpTV to Sunway architecture and tame expensive memory access latency and parallel writing conflict due to the intrinsic irregularity of SpTV, we introduce an adaptive SpTV parallelization. Second, to co-execute with the parallelization design while still ensuring high efficiency, we design a sparse tensor data structure named CSSoCR. Third, based on the adaptive SpTV parallelization with the novel tensor data structure, we present an autotuner that chooses the most befitting tensor partitioning method for aeSpTV using the variance analysis theory of mathematical statistics to achieve load balance. Fourth, to further leverage the computing power of Sunway, we propose customized optimizations for aeSpTV. Experimental results show that aeSpTV yields good sacalability on both thread-level and process-level parallelism of Sunway. It achieves a maximum GFLOPS of 195.69 on 128 processes. Additionally, it is proved that optimization effects of the partitioning autotuner and optimization techniques are remarkable.
Read full abstract