Cumulative Regret Research Articles

Robotic sampling is attractive in many field robotics applications that require persistent collection of physical samples for ex-situ analysis. Examples abound in the earth sciences in studies involving the collection of rock, soil, and water samples for laboratory analysis. In our test domain, marine ecosystem monitoring, detailed understanding of plankton ecology requires laboratory analysis of water samples, but predictions using physical and chemical properties measured in real-time by sensors aboard an autonomous underwater vehicle (AUV) can guide sample collection decisions. In this paper, we present a data-driven and opportunistic sampling strategy to minimize cumulative regret for batches of plankton samples acquired by an AUV over multiple surveys. Samples are labeled at the end of each survey, and used to update a probabilistic model that guides sampling during subsequent surveys. During a survey, the AUV makes irrevocable sample collection decisions online for a sequential stream of candidates, with no knowledge of the quality of future samples. In addition to extensive simulations using historical field data, we present results from a one-day field trial where beginning with a prior model learned from data collected and labeled in an earlier campaign, the AUV collected water samples with a high abundance of a pre-specified planktonic target. This is the first time such a field experiment has been carried out in its entirety in a data-driven fashion, in effect “closing the loop” on a significant and relevant ecosystem monitoring problem while allowing domain experts (marine ecologists) to specify the mission at a relatively high level.

Read full abstract

Sponsored Search Auctions (SSAs) constitute one of the most successful applications of microeconomic mechanisms. In mechanism design, auctions are usually designed to incentivize advertisers to bid their truthful valuations and, at the same time, to guarantee both the advertisers and the auctioneer a non-negative utility. Nonetheless, in sponsored search auctions, the Click–Through–Rates (CTRs) of the advertisers are often unknown to the auctioneer and thus standard truthful mechanisms cannot be directly applied and must be paired with an effective learning algorithm for the estimation of the CTRs. This introduces the critical problem of designing a learning mechanism able to estimate the CTRs at the same time as implementing a truthful mechanism with a revenue loss as small as possible compared to the mechanism that can exploit the true CTRs. Previous work showed that, when dominant-strategy truthfulness is adopted, in single-slot auctions the problem can be solved using suitable exploration–exploitation mechanisms able to achieve a cumulative regret (on the auctioneer's revenue) of order O˜(T23), where T is the number of times the auction is repeated. It is also known that, when truthfulness in expectation is adopted, a cumulative regret (over the social welfare) of order O˜(T12) can be obtained. In this paper we extend the results available in the literature to the more realistic case of multi-slot auctions. In this case, a model of the user is needed to characterize how the CTR of an ad changes as its position in the allocation changes. In particular, we adopt the cascade model, one of the most popular models for sponsored search auctions, and we prove a number of novel upper bounds and lower bounds on both auctioneer's revenue loss and social welfare w.r.t. the Vickrey–Clarke–Groves (VCG) auction. Furthermore, we report numerical simulations investigating the accuracy of the bounds in predicting the dependency of the regret on the auction parameters.

Read full abstract

Cumulative Regret Research Articles

Articles published on Cumulative Regret

Efficient-UCBV: An Almost Optimal Algorithm Using Variance Estimates

Gaussian process bandits with adaptive discretization

Autonomous Vehicle Control Through the Dynamics and Controller Learning

Online Learning and Decision-Making under Generalized Linear Model with High-Dimensional Data

The non-stationary stochastic multi-armed bandit problem

Randomized allocation with arm elimination in a bandit problem with covariates

Simple and cumulative regret for continuous noisy optimization

Data-driven robotic sampling for marine ecosystem monitoring

Random-Walk Perturbations for Online Combinatorial Optimization

Truthful learning mechanisms for multi-slot sponsored search auctions with externalities

Adaptive NormalHedge for robust visual tracking

Learning with stochastic inputs and adversarial outputs

Pure exploration in finitely-armed and continuous-armed bandits

Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting

Linearly Parameterized Bandits

Regret to the best vs. regret to the average

Strategy Under the Unknown Stochastic Environment: the Nonparametric Lob—Pass Problem

On the bounded regret of empirical bayes estimators

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cumulative Regret Research Articles

Articles published on Cumulative Regret

Efficient-UCBV: An Almost Optimal Algorithm Using Variance Estimates

Gaussian process bandits with adaptive discretization

Autonomous Vehicle Control Through the Dynamics and Controller Learning

Online Learning and Decision-Making under Generalized Linear Model with High-Dimensional Data

The non-stationary stochastic multi-armed bandit problem

Randomized allocation with arm elimination in a bandit problem with covariates

Simple and cumulative regret for continuous noisy optimization

Data-driven robotic sampling for marine ecosystem monitoring

Random-Walk Perturbations for Online Combinatorial Optimization

Truthful learning mechanisms for multi-slot sponsored search auctions with externalities

Adaptive NormalHedge for robust visual tracking

Learning with stochastic inputs and adversarial outputs

Pure exploration in finitely-armed and continuous-armed bandits

Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting

Linearly Parameterized Bandits

Regret to the best vs. regret to the average

Strategy Under the Unknown Stochastic Environment: the Nonparametric Lob—Pass Problem

On the bounded regret of empirical bayes estimators