The Exploration of Pervasive and Fine-Grained Parallel Model Applied on Intel Xeon Phi Coprocessor

Christophe Calvin,Serge Petiton,Fan Ye

doi:10.1109/3pgcic.2013.31

Abstract

In this paper we investigate the dissimilar multithreading programming paradigms on x86 CPU architectures, where the recently released Intel Xeon Phi Coprocessor and commonly used Intel Xeon processors were studied, as well as the NVIDIA K20 GPU, which represents the cutting-edge general purpose graphics processing unit. The relevant numerical algorithm selected to address the problem is power method, which is widely used to compute the dominant eigenvalue of a matrix. This work focuses on dense linear algebra. The frequently used multi-core or many-core processor parallelism techniques include OpenMP, Intel Cilk Plus, Intel Threading Building Blocks, i.e. TBB, along with the optimized computing libraries such as Intel Math Kernel Library(MKL) or the NVIDIA CUDA Basic Linear Algebra Subroutines(cuBLAS) library. Optimized implementations of these techniques were separately applied to the aforementioned architectures. For the reason that a unitary programming model may not satisfy the growing performance demand, we also explored some possible mix of these languages. The study shows that the hybrid pattern of multithreading and data parallelism via explicit vectorization maximizes the performance on x86 architectures, which allows us to obtain 80% of the sustainable peak performance in double precision on the Intel Many Integrated Core(MIC) Architecture. In the case of single precision, this number reaches even 96%. In addition, this approach enables a reasonable performance by requiring least developing time. The numbers of iterations till convergence are roughly the same in both architectures of CPU and GPU. The GPU performs better in small matrix sizes. However, the Intel Xeon Phi coprocessor excels for large sizes with a better scalability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Exploration of Pervasive and Fine-Grained Parallel Model Applied on Intel Xeon Phi Coprocessor

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Comparison of Three Popular Parallel Programming Models on the Intel Xeon Phi
Ashkan Tousimojarad ... Wim Vanderbauwhede
-
Ashkan Tousimojarad, et. al.Ashkan Tousimojarad ... Wim Vanderbauwhede
01 Jan 2014
01 Jan 2014

Intel Cilk Plus for complex parallel algorithms: “Enormous Fast Fourier Transforms” (EFFT) library
Ryo Asai ... Andrey Vladimirov
Parallel Computing | VOL. 48
Ryo Asai, et. al.Ryo Asai ... Andrey Vladimirov
01 Oct 2015
Parallel Computing | VOL. 48

Explicit Fourth-Order Runge\u2013Kutta Method on Intel Xeon Phi Coprocessor
Beata Bylina ... Joanna Potiopa
International Journal of Parallel Programming | VOL. 45
Beata Bylina, et. al.Beata Bylina ... Joanna Potiopa
29 Sep 2016
Explicit Fourth-Order Runge\u2013Kutta Method on Intel Xeon Phi Coprocessor
Beata Bylina ... Joanna Potiopa

Multi-Core Program Optimization: Parallel Sorting Algorithms in Intel Cilk Plus
Sabahat Saleem ... Abou Bakar Nauman
International Journal of Hybrid Information Technology | VOL. 7
Sabahat Saleem, et. al.Sabahat Saleem ... Abou Bakar Nauman
31 Mar 2014
International Journal of Hybrid Information Technology | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Exploration of Pervasive and Fine-Grained Parallel Model Applied on Intel Xeon Phi Coprocessor

Abstract

Talk to us

Similar Papers