Abstract

The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of Adaptive Simpson’s Integration to programs that can utilize multiple cores of modern processors. Using the example of Belman–Ford algorithm for solving single-source shortest path problems, we advise how to improve performance of data parallel algorithms by tuning data structures for better utilization of vector extensions of modern processors. Manual vectorization techniques based on Cilk array notation and intrinsics are presented. We also show how to simplify such optimization using Intel SIMD Data Layout Template containers.

Highlights

  • Multicore and manycore computer architectures have become very attractive for achieving high-performance execution of scientific applications at relatively low costs [5,13,17]

  • Let us consider the results of experiments performed to compare five considered implementations of Belman–Ford algorithm: BF1, BF2, BF3, BF4 and BF5

  • We have shown that Cilk Plus can be very applied to parallelize recursively defined Adaptive Simpson’s Integration Rule and such implementation can utilize coprocessors such as Intel Xeon Phi

Read more

Summary

Introduction

Multicore and manycore computer architectures have become very attractive for achieving high-performance execution of scientific applications at relatively low costs [5,13,17]. We show how to parallelize recursively defined Adaptive Simpson’s Integration Rule [7] using OpenMP, Intel TBB and Cilk Plus [9], and we consider various implementations of Belman–Ford algorithm for solving the single-source shortest path problem [4] and examine their performance These two computational problems have been chosen to demonstrate the most important features of the considered language-based tools. { double c = ( a + b ) /2 , h = b - a ; double fa = f(a) , fb = f(b) , fc = f(c); double S = (h /6) *( fa + 4* fc + fb ); return cilkASAux (f ,a ,b , eps ,S ,fa ,fb ,fc , depth ); hyperthreading, 2.3 GHz), 128GB RAM, with Intel Xeon Phi Coprocessor 7120P (61 cores with multithreading, 1.238 GHz, 16GB RAM), running under CentOS 6.5 with Intel Parallel Studio version 2017, C/C++ compiler supporting Cilk Plus, TBB and SDLT. Experiments on Xeon Phi have been carried out using its native mode

Adaptive Simpson’s Integration Rule
Bellman–Ford algorithm for the single-source shortest path problem
Result
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call