Abstract

In this paper, we propose some efficient data redistribution algorithms for redistributing matrices from 1D or 2D irregular format to block cyclic data distribution (BCDD) format, which can be much faster than the BLACS routine <monospace>PXGEMR2D</monospace> . These algorithms can be used to combine direct methods with iterative methods. The proposed algorithms divide the communication into two phases: one for processes in the same column and the other for processes in the same row, and the whole data redistribution task is divided into several independent sub-communications. The communication time can be reduced a lot compared with BLACS. Performance results show that our algorithms can be <inline-formula><tex-math notation="LaTeX">$2\times$</tex-math></inline-formula> – <inline-formula><tex-math notation="LaTeX">$5\times$</tex-math></inline-formula> faster than the BLACS routine <monospace>PXGEMR2D</monospace> when using 4096 processes and the experiments are performed on Tianhe-2A supercomputer.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call