In recent years, Intel promotes its new product Xeon Phi coprocessor, which is similar to the x86 architecture coprocessor. It has about 60 cores and can be regarded as a single computing node, with the computing power that cannot be ignored. This work aims to improve the workload balance by parallel loop self-scheduling scheme performed on Xeon Phi-based computer cluster. The proposed concept is implemented by hybrid MPI and OpenMP parallel programming in C language. Since parallel loop self-scheduling composes of static and dynamic allocation, weighting algorithm is adopted in the static part, while the well-known loop self-scheduling is adopted in dynamic part. The loop block is partitioned according to the weighting of MIC and HOST nodes. Accordingly, Xeon Phi with many-core is adopted to implement parallel loop self-scheduling. Finally, we test the performance in the experiments by four applicable problems: matrix multiplication, sparse matrix multiplication, Mandelbrot set and circuit meet. The experimental results indicate how to do the weight allocation and which scheduling method can achieve the best performance.
Read full abstract