Abstract
Scalable multithreading poses challenges to I/O for the ATLAS experiment. The performance of a thread-safe I/O strategy may depend upon many factors, including I/O latencies, whether tasks are CPU- or I/O-intensive, and thread count. In a multithreaded framework, an I/O infrastructure must efficiently supply event data to and collect it from many threads processing multiple events in flight. In particular, on-demand reading from multiple threads may challenge caching strategies that were developed for serial processing and may need to be enhanced. This I/O infrastructure must also address how to read, make available, and propagate in-file metadata and other non-event data needed as context for event processing. We describe the design and scheduling of I/O components in the ATLAS multithreaded control framework, AthenaMT, for both event and non-event I/O. We discuss issues associated with exploiting the multithreading capabilities of our underlying persistence technology, ROOT, in a manner harmonious with the ATLAS framework?s own approach to thread management. Finally, we discuss opportunities for evolution and simplification of I/O components that have successfully supported ATLAS event processing for many years from their serial incarnations to their thread-safe counterparts.
Highlights
The period of data taking at the LHC will stress the ATLAS [1] computing infrastructure in two important ways, the event rate and the event complexity
High Level Trigger (HLT): The increasing complexity of events challenges both the memory capacity and throughput of the computing farms used for the HLT
ATLAS already makes extensive use of ROOT?sTTreeCache system when reading ROOT data. This system was designed for reading data sequentially, whereas a multithreaded framework will process events in no particular order.This, combined with on-demand reading, can lead to cache thrashing at cache boundaries, when retrieval of a data object for a not-yet-cached event causes the cache to be flushed while the framework is still processing an event that may still need to retrieve data from the current cache
Summary
The period of data taking at the LHC will stress the ATLAS [1] computing infrastructure in two important ways, the event rate and the event complexity. Combined with computing architectures which scale by adding cores and coprocessors rather than clock cycles, this requires an approach which emphasizes parallelism, either using multiple processes across the cores or using multithreading within a single process. For memory constrained processes such as event reconstruction of raw data, this requires a multithreaded approach due to the increasing event complexity (expanding memory footprint) and the expanding core count per node (reduced memory/core). Moving data in and out of these processes is a necessary component to their efficient utilization
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.