Minimizing CPU Utilization Requirements to Monitor an ATLAS Data Transfer System

G Leventis,M Dönszelmann,J Schumacher

doi:10.1088/1748-0221/15/02/c02009

G Leventis, M Dönszelmann + Show 1 more

Open Access

https://doi.org/10.1088/1748-0221/15/02/c02009

Copy DOI

Abstract

The ATLAS experiment at LHC will use a PC-based read-out component called FELIX to connect its front-end electronics to the Data Acquisition System. FELIX translates custom front-end protocols to Ethernet and vice versa. Currently, FELIX makes use of parallel multi-threading to achieve the data rate requirements. In order to establish the FELIX operation conditions, monitoring of its parameters is necessary. This includes, but is not limited to, data counters and rates as well as compute resource utilisation. However, for these statistics to be of practical use, the parallel threads are required to intercommunicate. The FELIX monitoring implementation prior to this research utilized thread-safe queues to which data was pushed from the parallel threads. A central thread would extract and combine the queue contents. Enabling statistics would deteriorate the throughput to less than a fifth of the baseline performance. To minimize this performance hit to the greatest extent, we take advantage of the CPU's microarchitecture features and reduce concurrency. The focus is on hardware-supported atomic operations. When a thread performs an atomic operation, the other threads see it as happening instantaneously. They are used to complement and/or replace parallel computing lock mechanisms. The aforementioned queue system gets replaced with sets of C/C++ atomic variables and corresponding atomic functions, hereinafter referred to as atomics. Three implementations are tested. Implementation I has one set of atomic variables being updated by all the parallel threads. Implementation II has a set of atomic variables for every thread. These sets are periodically accumulated by a central thread. Implementation III is the same as implementation II, but appropriate measures are taken to eliminate any concurrency implications. The compiler used during the measurements is GCC, which supports the hardware (microarchitecture) optimizations for atomics. Implementations I and II resulted in negligible differences compared to the original one. Some benchmarks even show deterioration of the performance. Implementation III (concurrency & cache optimized) yields results with a performance improvement of up to six-fold increase compared to the original implementation. Achieved throughput is significantly closer to what is desirable. Similar structured software applications could benefit from the results of this research, especially Implementation III. The results presented demonstrate that atomics can be useful for efficient computations in a multi-threaded environment. However, from the results, it is clear that concurrency, cache invalidation and proper usage of the system's microarchitecture needs to be taken into account in this programming model. The paper details the challenges of properly using atomics and how they are overcome in the implementation of the FELIX monitoring system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Instrumentation	Publication Date: Feb 1, 2020
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Minimizing CPU Utilization Requirements to Monitor an ATLAS Data Transfer System

Abstract

Talk to us

Similar Papers

More From: Journal of Instrumentation

Lead the way for us

Similar Papers

A Comparative Evaluation of Parallel Programming Models for Shared-Memory Architectures
Luis Miguel Sanchez ... Javier Fernandez
-
Luis Miguel Sanchez, et. al.Luis Miguel Sanchez ... Javier Fernandez
01 Jul 2012
01 Jul 2012

Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP
Christopher P Stone ... Roger L Davis
-
Christopher P Stone, et. al.Christopher P Stone ... Roger L Davis
01 Jan 2018
01 Jan 2018

SSSP on GPU Without Atomic Operation
Feng Wang ... Changyou Zhang
-
Feng Wang, et. al.Feng Wang ... Changyou Zhang
01 Jan 2015
01 Jan 2015

Parallel Variable-Length Encoding on GPGPUs
Ana Balevic
-
Ana BalevicAna Balevic
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Minimizing CPU Utilization Requirements to Monitor an ATLAS Data Transfer System

Abstract

Talk to us

Similar Papers

More From: Journal of Instrumentation