Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus

Przemysław Stpiczyński

doi:10.1007/s11227-017-2231-3

Przemysław Stpiczyński

Open Access

https://doi.org/10.1007/s11227-017-2231-3

Copy DOI

Abstract

The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of Adaptive Simpson’s Integration to programs that can utilize multiple cores of modern processors. Using the example of Belman–Ford algorithm for solving single-source shortest path problems, we advise how to improve performance of data parallel algorithms by tuning data structures for better utilization of vector extensions of modern processors. Manual vectorization techniques based on Cilk array notation and intrinsics are presented. We also show how to simplify such optimization using Intel SIMD Data Layout Template containers.

Highlights

Multicore and manycore computer architectures have become very attractive for achieving high-performance execution of scientific applications at relatively low costs [5,13,17]
Let us consider the results of experiments performed to compare five considered implementations of Belman–Ford algorithm: BF1, BF2, BF3, BF4 and BF5
We have shown that Cilk Plus can be very applied to parallelize recursively defined Adaptive Simpson’s Integration Rule and such implementation can utilize coprocessors such as Intel Xeon Phi

Summary

Introduction

Multicore and manycore computer architectures have become very attractive for achieving high-performance execution of scientific applications at relatively low costs [5,13,17]. We show how to parallelize recursively defined Adaptive Simpson’s Integration Rule [7] using OpenMP, Intel TBB and Cilk Plus [9], and we consider various implementations of Belman–Ford algorithm for solving the single-source shortest path problem [4] and examine their performance These two computational problems have been chosen to demonstrate the most important features of the considered language-based tools. { double c = ( a + b ) /2 , h = b - a ; double fa = f(a) , fb = f(b) , fc = f(c); double S = (h /6) *( fa + 4* fc + fb ); return cilkASAux (f ,a ,b , eps ,S ,fa ,fb ,fc , depth ); hyperthreading, 2.3 GHz), 128GB RAM, with Intel Xeon Phi Coprocessor 7120P (61 cores with multithreading, 1.238 GHz, 16GB RAM), running under CentOS 6.5 with Intel Parallel Studio version 2017, C/C++ compiler supporting Cilk Plus, TBB and SDLT. Experiments on Xeon Phi have been carried out using its native mode

Adaptive Simpson’s Integration Rule

Bellman–Ford algorithm for the single-source shortest path problem

Result

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Journal of Supercomputing	Publication Date: Jan 11, 2018
Citations: 11	License type: open-access

R Discovery Prime

R Discovery Prime

Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Similar Papers

Efficient Language-Based Parallelization of Computational Problems Using Cilk Plus
Przemysław Stpiczyński
-
Przemysław StpiczyńskiPrzemysław Stpiczyński
01 Jan 2018
01 Jan 2018

Comparison of Threading Programming Models
Solmaz Salehian ... Yonghong Yan
-
Solmaz Salehian, et. al.Solmaz Salehian ... Yonghong Yan
01 May 2017
01 May 2017

Randomized Work Stealing for Large Scale Soft Real-Time Systems
Jing Li ... Kevin Kieselbach
-
Jing Li, et. al.Jing Li ... Kevin Kieselbach
01 Nov 2016
01 Nov 2016

Braid: integrating task and data parallelism
E.A West ... A.S Grimshaw
-
E.A West, et. al.E.A West ... A.S Grimshaw
06 Feb 1995
06 Feb 1995

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of Supercomputing