Abstract

In a wide range of recommendation systems, self-interested individuals (“agents”) make decisions over time, using information revealed by other agents in the past, and producing information that may help agents in the future. Each agent would like to exploit the best action given the current information but would prefer the previous agents to explore various alternatives to collect information. A social planner, by means of a well-designed recommendation policy, can incentivize the agents to balance exploration and exploitation in order to maximize social welfare or some other objective. The recommendation policy can be modeled as a multiarmed bandit algorithm under Bayesian incentivecompatibility (BIC) constraints. This line of work has received considerable attention in the “economics and computation” community. Although in prior work, the planner interacts with a single agent at a time, the present paper allows the agents to affect one another directly in a shared environment. The agents now face two sources of uncertainty: what is the environment, and what would the other agents do? We focus on “explorable” actions: those that can be recommended by some BIC policy. We show how the principal can identify and explore all such actions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call