Abstract

AbstractIn this note, we consider dynamic assortment optimization with incomplete information under the capacitated multinomial logit choice model. Recently, it has been shown that the regret (the cumulative expected revenue loss caused by offering suboptimal assortments) that any decision policy endures is bounded from below by a constant times $\sqrt {NT}$, where $N$ denotes the number of products and $T$ denotes the time horizon. This result is shown under the assumption that the product revenues are constant, and thus leaves the question open whether a lower regret rate can be achieved for nonconstant revenue parameters. In this note, we show that this is not the case: we show that, for any vector of product revenues there is a positive constant such that the regret of any policy is bounded from below by this constant times $\sqrt {N T}$. Our result implies that policies that achieve ${{\mathcal {O}}}(\sqrt {NT})$ regret are asymptotically optimal for all product revenue parameters.

Highlights

  • We consider the problem of assortment optimization under the multinomial logit (MNL)

  • Two notable recent contributions are from Agrawal et al [1,2], who construct decision policies based on Thompson Sampling and Upper Confidence Bounds, respectively, and show that the regret of these policies—the cumulative expected revenue loss compared with the benchmark√of always offering an optimal assortment—is, up to logarithmic terms, bounded by a constant times NT, where N denotes the number of products and T N denotes the length of the time horizon

  • These upper bounds are complemented by the recent work from Chen and W√ang [3], who show that the regret of any policy is bounded from below by a positive constant times NT, implying that the policies by Agrawal et al

Read more

Summary

Introduction

We consider the problem of assortment optimization under the multinomial logit (MNL). Two notable recent contributions are from Agrawal et al [1,2], who construct decision policies based on Thompson Sampling and Upper Confidence Bounds, respectively, and show that the regret of these policies—the cumulative expected revenue loss compared with the benchmark√of always offering an optimal assortment—is, up to logarithmic terms, bounded by a constant times NT, where N denotes the number of products and T N denotes the length of the time horizon. We settle this open question by provin√g a NT regret lower bound for any given vector of product revenues This implies that policies with O ( NT) regret are asymptotically optimal regardless. Thereby confirming the intuition of Chen and Wang [3] that the constraint K is not tight

Model and main result
Proof outline
Step 1
Step 2
Step 3
Step 4
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.