Abstract

This paper introduces a resource allocation framework specifically tailored for addressing the problem of dynamic placement (or pinning) of parallelized applications to processing units. Under the proposed setup each thread of the parallelized application constitutes an independent decision maker (or agent), which (based on its own prior performance measurements and its own prior CPU-affinities) decides on which processing unit to run next. Decisions are updated recursively for each thread by a resource manager/scheduler which runs in parallel to the application’s threads and periodically records their performances and assigns to them new CPU affinities. For updating the CPU-affinities, the scheduler uses a distributed reinforcement-learning algorithm, each branch of which is responsible for assigning a new placement strategy to each thread. The proposed framework is flexible enough to address alternative optimization criteria, such as maximum average processing speed and minimum speed variance among threads. We demonstrate analytically that convergence to locally-optimal placements is achieved asymptotically. Finally, we validate these results through experiments in Linux platforms.

Highlights

  • We demonstrate through experiments in a Linux platform that the proposed algorithm outperforms the scheduling strategies of the operating system with respect to completion time

  • We demonstrate the response of the Reinforcement Learning (RL) scheme in comparison to the Operating System (OS) response

  • We proposed a measurement-based learning scheme for addressing the problem of efficient dynamic pinning of parallelized applications into processing units

Read more

Summary

Introduction

Resource allocation has become an indispensable part of the design of any engineering system that consumes resources, such as electricity power in home energy management [1], access bandwidth and battery life in wireless communications [8], computing bandwidth under certain QoS requirements [2], computing bandwidth for time-sensitive applications [5], computing bandwidth and memory in parallelized applications [3]. In such dynamic environments, it is more appropriate to consider learning-based optimization techniques where the scheduling policy is being updated based on performance measurements from the running threads Through such measurement- or learning-based scheme, we can a) reduce information complexity (i.e., when dealing with a large number of possible thread/memory bindings) since only performance measurements need to be collected during runtime, and b) adapt to uncertain/irregular application behavior. To this end, this paper proposes a dynamic (algorithmic-based) scheme for optimally allocating threads of a parallelized application into a set of available CPU cores. Π∆(n)[x] is the projection of a vector x ∈ Rn onto ∆ (n). ej ∈ Rn denotes the unit vector whose jth entry is equal to 1 while all other entries are zero; For a vector σ ∈ ∆ (n), let randσ [a1, ..., an] denote the random selection of an element of the set {a1, ..., an} according to the distribution σ;

Framework
Measurement- or learning-based optimization
Distributed learning
Objective
Multi-Agent Formulation
Strategy
Assignment Game
Nash Equilibria
Efficient assignments vs Nash equilibria
Strategy update
Convergence Analysis
Experiments
Experimental Setup
Experiment 1
Experiment 2
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call