Learning Proportionally Fair Allocations with Low Regret

Mohammad Sadegh Talebi,Alexandre Proutiere

doi:10.1145/3224431

Abstract

This paper addresses a generic sequential resource allocation problem, where in each round a decision maker selects an allocation of resources (servers) to a set of tasks consisting of a large number of jobs. A job of task i assigned to server j is successfully treated with probability θ_ij $ in a round, and the decision maker is informed on whether this job is completed at the end of the round. The probabilities θ_ij $'s are initially unknown and have to be learned. The objective of the decision maker is to sequentially assign jobs of various tasks to servers so that it rapidly learns and converges to the Proportionally Fair (PF) allocation (or other similar allocations achieving an appropriate trade-off between efficiency and fairness). We formulate the problem as a multi-armed bandit (MAB) optimization problem, and devise sequential assignment algorithms with low regret (defined as the difference in utility achieved by an oracle algorithm aware of the θ_ij $'s and by the proposed algorithm over a given number of slots). We first provide the properties of the so-called Restricted-PF (RPF) allocation, obtained by assuming that each task can only use a single server, and in particular show that it is very close to the PF allocation. We devise ES-RPF, an algorithm that learns the RPF allocation with regret no greater than $\mathcal O \bigl(m^3øver θ_\min Δ_\min łog(T)\big)$ after T slots, where m , θ_\min $, and Δ_\min $ represent the number of tasks, the minimum success rate $\min_i,j θ_ij $, and an appropriately defined notion of gap, respectively. We further provide regret lower bounds satisfied by any algorithm targeting the RPF allocation. Finally, we present ES-PF, an algorithm directly learning the PF allocation, and prove that its regret does not exceed $\mathcal O \bigl(\fracm^2s θ_\min \sqrtT łog(T)\big)$ after T slots, where s denotes the number of servers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Proportionally Fair Allocations with Low Regret

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Measurement and Analysis of Computing Systems

Lead the way for us

Journal: Proceedings of the ACM on Measurement and Analysis of Computing Systems	Publication Date: Jun 13, 2018
Citations: 1

Similar Papers

Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems?
Kento Ochi ... Moto Kamiura
BioSystems | VOL. 135
Kento Ochi, et. al.Kento Ochi ... Moto Kamiura
10 Jul 2015
BioSystems | VOL. 135

Policy Gradients for Contextual Recommendations
Feiyang Pan ... Qing He
-
Feiyang Pan, et. al.Feiyang Pan ... Qing He
13 May 2019
13 May 2019

Learning Proportionally Fair Allocations with Low Regret
Mohammad Sadegh Talebi ... Alexandre Proutiere
ACM SIGMETRICS Performance Evaluation Review | VOL. 46
Mohammad Sadegh Talebi, et. al.Mohammad Sadegh Talebi ... Alexandre Proutiere
12 Jun 2018
ACM SIGMETRICS Performance Evaluation Review | VOL. 46

Learning Proportionally Fair Allocations with Low Regret
Mohammad Sadegh Talebi ... Alexandre Proutiere
-
Mohammad Sadegh Talebi, et. al.Mohammad Sadegh Talebi ... Alexandre Proutiere
12 Jun 2018
12 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Proportionally Fair Allocations with Low Regret

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Measurement and Analysis of Computing Systems