bandits – Optimization Online

Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits

Published: 2024/10/07, Updated: 2025/02/13

Convex Optimization, Data Science Algorithms, Data Science Applications bandits, discrete choice models, online learning

Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as stochastic bandit problems and allow for a streamlined analysis. However, FTRL algorithms require the solution of an optimization problem in every iteration and are thus computationally challenging. In contrast, Follow-The-Perturbed-Leader (FTPL) algorithms achieve computational efficiency by perturbing the estimates of the rewards of … Read more