Abstract

Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel's solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel's Threading Building Blocks (TBB). This should make it possible to develop for both throughput and latency devices using a single code base. In ATLAS Software, track reconstruction has been shown to be a good candidate for throughput computing on GPGPU devices. In addition, the newly proposed offline parallel event-processing framework, GaudiHive, uses TBB for task scheduling. The MIC is thus, in principle, a good fit for this domain. In this paper, we report our experiences of porting to and optimizing ATLAS tracking algorithms for the MIC, comparing the programmability and relative cost/performance of the MIC against those of current GPGPUs and latency-optimized CPUs.

Highlights

  • Tools, and support The main selling point of the Many Integrated Core (MIC) architecture is that it is x86: existing codes are relatively straightforward to port, and optimization for both coprocessor and CPU can be done on a single code base

  • Contrary to OpenCL, ISPC does not restrict its code generation to kernels with the iteration over elements only in outer loops, but rather targets the single-instruction, multiple-data (SIMD) vector units of the CPU. We find that this works well, albeit that ISPC is currently C-only and support for MIC’s VPU is inadequate to point of being non-existent

  • One approach to track fitting on Graphics Processing Units (GPUs) and the MIC is to use a Kalman filter with a so-called reference trajectory instead: the extrapolation, including material effects, is done on the CPU and independent of the actual track fit

Read more

Summary

Introduction

3. Compilers, tools, and support The main selling point of the MIC architecture is that it is x86: existing codes are relatively straightforward to port, and optimization for both coprocessor and CPU can be done on a single code base. This contrasts with GPUs that have direct hardware support for programming models that require large scale vectorization and parallelization.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call