Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Ananya Muddukrishna,Peter A Jonsson,Mats Brorsson

doi:10.1155/2015/981759

Abstract

Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.

Highlights

NUMA systems consist of several multicore processors attached to local memory modules
Modern NUMA systems have reached such size and complexity that even simple memory-oblivious parallel executions such as the task-based Fibonacci program with work-stealing scheduling have begun to suffer from NUMA effects [1]
We present a locality-aware scheduling algorithm for OpenMP tasks which reduces memory access times by leveraging locality information gained from data distribution and task data footprint information from the programmer

Summary

Introduction

NUMA systems consist of several multicore processors attached to local memory modules. Local memory can be accessed both faster and with higher bandwidth than remote memory by cores within a processor. Data distribution is required on manycore processors which exhibit on-chip NUMA effects due to banked shared caches. Cores can access their local cache bank faster than remote banks. The latency of accessing far-off remote cache banks approaches off-chip memory access latencies. Another performance consideration is that cache coherence of manycore processors is software configurable [2]. Scheduling should adapt to remote cache bank access latencies that can change based on the configuration

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Jan 1, 2015
Citations: 10	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

Locality Aware Task Scheduling in Parallel Data Stream Processing
Zbyněk Falt ... Jakub Yaghob
-
Zbyněk Falt, et. al.Zbyněk Falt ... Jakub Yaghob
01 Jan 2015
01 Jan 2015

Locality-Aware Task Scheduling and Data Distribution on NUMA Systems
Ananya Muddukrishna ... Peter A. Jonsson
-
Ananya Muddukrishna, et. al.Ananya Muddukrishna ... Peter A. Jonsson
01 Jan 2013
01 Jan 2013

HASO: A hot-page aware scheduling optimization method in virtualized NUMA systems
Butian Huang ... Bei Wang
-
Butian Huang, et. al.Butian Huang ... Bei Wang
01 Apr 2016
01 Apr 2016

NUMA-Aware Task Performance Analysis
Dirk Schmidl ... Matthias S. Müller
-
Dirk Schmidl, et. al.Dirk Schmidl ... Matthias S. Müller
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming