Published on Sun Nov 03 2019

Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Yichong Xu, Aparna Joshi, Aarti Singh, Artur Dubrawski
0
0
0
Abstract

We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB), we perform a constrained optimization and use comparisons to filter out suboptimal points. COMP-GP-UCB comes with theoretical guarantee of on simple regret where is the number of direct queries and is an improved information gain corresponding to a comparison based constraint set that restricts the search space for the optimum. In contrast, in the direct query only setting, depends on the entire domain. Finally, we present experimental results to show the efficacy of our algorithm.

Sat May 08 2021
Machine Learning
A Simple yet Universal Strategy for Online Convex Optimization
0
0
0
Fri Apr 08 2016
Machine Learning
A Low Complexity Algorithm with Regret and Constraint Violations for Online Convex Optimization with Long Term Constraints
This paper considers online convex optimization over a complicated constraint set. The conventional online projection algorithm (Zinkevich, 2003) can be difficult to implement. In this paper, we relax the functional constraints by allowing them to be violated at each round.
0
0
0
Thu Oct 15 2020
Machine Learning
Continuum-Armed Bandits: A Function Space Perspective
Continuum-armed bandits (a.k.a., black-box or -order optimization) involves optimizing an unknown objective function given an oracle. In both noiseless and noisy conditions, we derive minimax rates under simple and cumulative regrets.
0
0
0
Thu Jul 16 2020
Machine Learning
Comparator-adaptive Convex Bandits
We develop convex bandit algorithms with regret bounds that are small whenever the norm of the comparator is small. We extend the ideas to convex bandits with smooth loss functions, using a new single-point gradient estimator and carefully designed surrogate losses.
0
0
0
Thu Apr 29 2021
Machine Learning
Regret Bounds for Gaussian-Process Optimization in Large Domains
0
0
0
Thu May 27 2021
Machine Learning
Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving
We develop new parameter and scale-free algorithms for solving convex-concave-saddle-point problems. Our results are based on a new simple regret minimizer, the Conic Blackwell Algorithm.
0
0
0