A tractable online learning algorithm for the multinomial logit contextual bandit

Priyank Agrawal,Theja Tulabandhula,Vashist Avadhanula

doi:10.1016/j.ejor.2023.02.036

Priyank Agrawal, Theja Tulabandhula + Show 1 more

Open Access

https://doi.org/10.1016/j.ejor.2023.02.036

Copy DOI

Abstract

In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the decision makers problem of dynamically learning the model parameters while optimizing cumulative revenue over the selling horizon T. Though this problem has recently attracted considerable attention, many existing methods often involve solving an intractable non-convex optimization problem. Their theoretical performance guarantees depend on a problem-dependent parameter which could be prohibitively large. In particular, current algorithms for this problem have regret bounded by O(κdT), where κ is a problem-dependent constant that may have an exponential dependency on the number of attributes, d. In this paper, we propose an optimistic algorithm and show that the regret is bounded by O(dT+κ), significantly improving the performance over existing methods. Further, we propose a convex relaxation of the optimization step, which allows for tractable decision-making while retaining the favorable regret guarantee. We also demonstrate that our algorithm has robust performance for varying κ values through numerical experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A tractable online learning algorithm for the multinomial logit contextual bandit

Abstract

Talk to us

Similar Papers

More From: European Journal of Operational Research

Lead the way for us

Journal: European Journal of Operational Research	Publication Date: Mar 1, 2023
Citations: 1

Similar Papers

Constrained dynamic multi-objective evolutionary optimization for operational indices of beneficiation process
Cuie Yang ... Jinliang Ding
Journal of Intelligent Manufacturing | VOL. 30
Cuie Yang, et. al.Cuie Yang ... Jinliang Ding
04 Apr 2017
Journal of Intelligent Manufacturing | VOL. 30

Empirical Study of Population-Based Dynamic Constrained Multimodal Optimization Algorithms
Xin Lin ... Wenjian Luo
-
Xin Lin, et. al.Xin Lin ... Wenjian Luo
01 Dec 2019
01 Dec 2019

EvoDCMMO: Benchmarking and solving dynamic constrained multimodal optimization problems
Xin Lin ... Tao Zhu
Swarm and Evolutionary Computation | VOL. 75
Xin Lin, et. al.Xin Lin ... Tao Zhu
01 Dec 2022
Swarm and Evolutionary Computation | VOL. 75

New method for solving a class of dynamic nonlinear constrained optimization problems
Chun-An Liu
-
Chun-An LiuChun-An Liu
01 Aug 2010
01 Aug 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A tractable online learning algorithm for the multinomial logit contextual bandit

Abstract

Talk to us

Similar Papers

More From: European Journal of Operational Research