Abstract
Task-based runtime systems are an important branch of parallel programming research, since tasks decouple computation from the compute units, giving the runtime systems greater flexibility than a thread-based solution. This makes it easier to deal with the ever-increasing complexity of parallel architectures by providing a separation of concerns—the specification of parallelism is separated from the implementation of the parallel computations on a specific architecture. The Open Community Runtime is one such system, aimed at large-scale parallel systems. Unlike many other task-based runtime systems, the creators not only provided an implementation but there is also a comprehensive specification document. This has allowed us to create an independent implementation, called OCR-Vx. In this article, we present our experience of developing the runtime system, put our work in the context of the specification and the other implementations, and describe key lessons that we have learned during our work. We discuss the design and implementation issues of task-based runtime systems and applications including task synchronization and scheduling, data management, memory consistency, the relation between shared-memory and distributed-memory runtime systems, NUMA architectures, and heterogeneous systems. The article is aimed at audiences not familiar with OCR, since we believe these lessons could be valuable for developers working on other task-based runtime systems or designing new ones.
Highlights
The Open Community Runtime (OCR, [37, 38]) is an open specification of a distributed task-based runtime system for extreme-scale parallel systems
A lot of our later work focused on the way OCR can be efficiently executed on non-uniform memory architecture (NUMA) systems [20], since we have discovered that the aspects of the OCR design originally aimed at distributed memory systems open up some interesting possibilities for shared-memory NUMA systems
The specific contributions and topics covered by this article include a discussion of the effect a formal specification document has on the whole ecosystem, the design trade-offs in a distributed runtime system, the OCR consistency model, novel approaches to mapping single-program multiple-data (SPMD) applications to task-based applications, the reasons for having three different OCR implementations, the NUMA support in our OCR-Vx implementation, and experimental performance results with different benchmarks on a variety of different architectures
Summary
The Open Community Runtime (OCR, [37, 38]) is an open specification of a distributed task-based runtime system for extreme-scale parallel systems. Task-based runtime systems are considered to be a promising way for addressing the challenges of programming future parallel systems, since they have greater control over the execution of the application. Our goal is to convey the lessons learned from our work on OCR-Vx, without burdening the reader with details about OCR We believe this information could be beneficial to anyone working on another existing task-based runtime system or designing a new system. The specific contributions and topics covered by this article include a discussion of the effect a formal specification document has on the whole ecosystem, the design trade-offs in a distributed runtime system, the OCR consistency model, novel approaches to mapping SPMD applications to task-based applications, the reasons for having three different OCR implementations, the NUMA support in our OCR-Vx implementation, and experimental performance results with different benchmarks on a variety of different architectures.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.