Abstract
Motivated by the observation that overexposure to unwanted marketing activities can lead to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the platform due to marketing fatigue. We propose a novel sequential choice model to capture multiple interactions taking place between the platform and its users: upon receiving a message, a user decides on whether to accept or reject the message. If she chooses to reject, she would then decide to either receive the next message in the sequence or abandon the platform. Based on user feedback, the platform dynamically learns users' abandonment distribution and the relevance of the recommended content. With a goal to maximize the cumulative payoff over a horizon of length T, the platform dynamically adjusts the sequence of messages and the order in which the messages are shown to a user. We refer to this online learning task as the sequential choice bandit (SC-Bandit) problem. For the offline combinatorial optimization problem, we show a polynomial-time algorithm. For the online problem, we consider two variants, depending on whether contexts are included, and propose algorithms that balance exploration and exploitation. Lastly, we evaluate the performance of our algorithms with both synthetic and real-world datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.