Abstract

This paper is concerned with the problem of designing agents able to dynamically select information from multiple data sources in order to tackle tasks that involve tracking a target behavior while optimizing a reward. We formulate this problem as a data-driven optimal control problem with integer decision variables and give an explicit expression for its solution. The solution determines how (and when) the data from the sources should be used by the agent. We also formalize a notion of agent's regret and, by relaxing the problem, give a regret upper bound. Simulations complement the results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call