Abstract

The ALFA framework supports the software development of major High Energy Physics experiments. As part of our research effort to optimize the transport layer of ALFA, we focus on profiling its data transfer performance for inter-node communication on the Intel Xeon Phi Coprocessor. In this article we present the collected performance measurements with the related analysis of the results. The optimization opportunities that are discovered, help us to formulate the future plans of enabling high performance data transfer for ALFA on the Intel Xeon Phi architecture.

Highlights

  • ALFA is the concurrency framework supporting the development of the data processing and event reconstruction software for ALICE and FAIR high energy physics experiments [1, 2]

  • By collecting a set of performance measurements we showed that for large message sizes, the Symmetric Communications InterFace (SCIF) data transport mechanism of Intel Xeon Phi Coprocessor outperforms by over an order of magnitude the ZeroMQ and NanoMSG messaging libraries

  • We take into account that the event sizes of O2 of ALICE is around 24 MB, and for this size of payloads SCIF Remote (Direct) Memory Access (RMA) operates at its maximum bandwidth

Read more

Summary

Introduction

ALFA is the concurrency framework supporting the development of the data processing and event reconstruction software for ALICE and FAIR high energy physics experiments [1, 2]. By taking advantage of the abstractions provided by FairMQ one can establish a data processing topology spanning over a computing cluster potentially featuring heterogeneous computing hardware. Distributed applications, such as the O2 software of ALICE [1], consist of hundreds of loosely coupled processes. The motivation of porting such processes to the Intel Xeon Phi Coprocessor is to increase the execution efficiency of software components that can take advantage of the computational capabilities of this particular architecture (high core count, wide vector engines, high memory bandwidth, etc.).

Background
PCIe Gen2 16x lanes
Performance testing programs
SCIF RMA Coprocessor to Host performance drop
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call