Abstract

This thesis explores scalable methods for adaptive decision making under uncertainty in stateless environments, where the goal of an agent is to design an experiment,observe the outcome, and plan subsequent experiments so as to achieve a desired goal. Typically, each experiment incurs a large computational or economic cost, and we need to keep the number of experiments to a minimum. Many of suchproblems fall under the bandit framework, where the outcome of each experiment can be viewed as a reward signal, and the goal is to optimise for this reward, i.e. findthe design that maximises this reward. A common use case for bandits, pervasive in many industrial and scientific applications, is hyperparameter tuning, where we needto find the optimal configuration of a black box system by tuning the several knobs which affect the performance of the system. Some applications include statisticalmodel selection, materials design, optimal policy selection in robotics, and maximum likelihood inference in simulation based scientific models. More generally, bandits are but one class of problems studied under the umbrella of adaptive decision making under uncertainty in stateless environments. Problems such as active learning and design of experiments are other examples of adaptive decision making, but unlike bandits, progress towards a desired goal is not made known to the agent via a reward signal.With increasingly expensive experiments and demands to optimise over complex input spaces, bandit optimisation tasks face new challenges today. At the same time,there are new opportunities that have not been exploited previously. We study the following questions in this thesis so as to enable the application of bandit and morebroadly adaptive decision making methods to modern real world applications. - Conventional bandit methods work reliably in low dimensional settings, but scalepoorly with input dimensionality. Scaling such methods to high dimensional domains requires addressing several computational and statistical challenges. - In many applications, an expensive experiment can be cheaply approximated. We study techniques that can use information from these cheap lower fidelityapproximations to speed up the overall optimisation process.- Conventional bandit methods are inherently sequential. We study parallelisationtechniques so as to deploy several experiments at the same time.- Typical methods assume that a design can be characterised by a Euclidean vector.We study bandit methods on graph-structured spaces. As a specific application, we study neural architecture search, which optimises for the structure of the neuralnetwork by viewing it as a directed graph with node labels and node weights.- Current methods for adaptive data collection are designed for specific tasks, and have limited applicability in problems with complex and application specific goals. We study a general framework for sequential design of experiments which allows one to specify their goal and incorporate other domain expertise. We first delve into the above topics in the bandit framework and then study how they can be extended to broader decision making problems. We develop methodswith theoretical guarantees which simultaneously enjoy good empirical performance. As part of this thesis, we also develop an open source Python framework for scalable and robust bandit optimisation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.