Abstract

Online minimization of an unknown convex function over a convex and compact set is considered under first-order stochastic bandit feedback, which returns a random realization of the gradient of the function at each query point. Without knowing the distribution of the random gradients, a learning algorithm sequentially chooses query points with the objective of minimizing regret defined as the expected cumulative loss of the function values at the query points in excess to the minimum value of the function. An active search strategy based on devising a biased random walk on an infinite-depth tree constructed through successive partitioning of the domain of the function is developed. It is shown that the biased random walk moves toward the optimal point in a geometric rate, leading to an order-optimal regret performance of $\tilde {\mathcal{O}}\left( {\sqrt T } \right)$. The structural properties of this random-walk based strategy admits detailed finite-time regret analysis. By localizing data processing to small subsets of the input domain based on the tree structure, it enjoys $\mathcal{O}\left( 1 \right)$ computation and memory complexity per query and allows dynamic allocation of limited data storage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call