Abstract

The optimizations discussed in this chapter significantly improved concurrency on both Intel Xeon Phi coprocessors and Intel Xeon processors. OpenMP scaling of 240 threads vs. one thread is now 100x, was 38x in first version for coprocessors. Similarly, processor scaling improved to 16x from 10x. The chapter discusses source modifications to transform fine-grain thread parallel approach to be more coarse-grain, memory allocation considerations on Intel Xeon Phi coprocessors, and source transformations to improve vectorization. In addition, this chapter briefly demonstrates how new features in VTune Amplifier XE can be used for OpenMP analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call