Abstract

Task-based runtime systems are an important branch of parallel programming research, since tasks decouple computation from the compute units, giving the runtime systems greater flexibility than a thread-based solution. This makes it easier to deal with the ever-increasing complexity of parallel architectures by providing a separation of concerns—the specification of parallelism is separated from the implementation of the parallel computations on a specific architecture. The Open Community Runtime is one such system, aimed at large-scale parallel systems. Unlike many other task-based runtime systems, the creators not only provided an implementation but there is also a comprehensive specification document. This has allowed us to create an independent implementation, called OCR-Vx. In this article, we present our experience of developing the runtime system, put our work in the context of the specification and the other implementations, and describe key lessons that we have learned during our work. We discuss the design and implementation issues of task-based runtime systems and applications including task synchronization and scheduling, data management, memory consistency, the relation between shared-memory and distributed-memory runtime systems, NUMA architectures, and heterogeneous systems. The article is aimed at audiences not familiar with OCR, since we believe these lessons could be valuable for developers working on other task-based runtime systems or designing new ones.

Highlights

  • The Open Community Runtime (OCR, [37, 38]) is an open specification of a distributed task-based runtime system for extreme-scale parallel systems

  • A lot of our later work focused on the way OCR can be efficiently executed on non-uniform memory architecture (NUMA) systems [20], since we have discovered that the aspects of the OCR design originally aimed at distributed memory systems open up some interesting possibilities for shared-memory NUMA systems

  • The specific contributions and topics covered by this article include a discussion of the effect a formal specification document has on the whole ecosystem, the design trade-offs in a distributed runtime system, the OCR consistency model, novel approaches to mapping single-program multiple-data (SPMD) applications to task-based applications, the reasons for having three different OCR implementations, the NUMA support in our OCR-Vx implementation, and experimental performance results with different benchmarks on a variety of different architectures

Read more

Summary

Introduction

The Open Community Runtime (OCR, [37, 38]) is an open specification of a distributed task-based runtime system for extreme-scale parallel systems. Task-based runtime systems are considered to be a promising way for addressing the challenges of programming future parallel systems, since they have greater control over the execution of the application. Our goal is to convey the lessons learned from our work on OCR-Vx, without burdening the reader with details about OCR We believe this information could be beneficial to anyone working on another existing task-based runtime system or designing a new system. The specific contributions and topics covered by this article include a discussion of the effect a formal specification document has on the whole ecosystem, the design trade-offs in a distributed runtime system, the OCR consistency model, novel approaches to mapping SPMD applications to task-based applications, the reasons for having three different OCR implementations, the NUMA support in our OCR-Vx implementation, and experimental performance results with different benchmarks on a variety of different architectures.

The Open Community Runtime
OCR basics
Design trade‐offs
Impact and evolution of the OCR specification
Memory model
Channel events and SPMD
Labeled GUIDs
Local identifiers and I/O
OCR‐Vx
Brief history of OCR‐Vx
OCR‐Vdm
OCR‐Vdm single process pseudo‐distributed environment
Distributed state management
OpenCL support
Fault tolerance
OCR‐Vsm
NUMA support
Automatic task and data placement
OCR‐V1
Experimental evaluation
Seismic
Shared‐memory systems
Distributed‐memory systems
Stencil2d
Face detection
Levenshtein on a heterogeneous system
Related work
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call