Mengmeng Li – Optimization Online

Towards Optimal Offline Reinforcement Learning

Published: 2025/03/15

We study offline reinforcement learning problems with a long-run average reward objective. The state-action pairs generated by any fixed behavioral policy thus follow a Markov chain, and the empirical state-action-next-state distribution satisfies a large deviations principle. We use the rate function of this large deviations principle to construct an uncertainty set for the unknown true … Read more

Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits

Published: 2024/10/07, Updated: 2025/02/13

Mengmeng Li

Daniel Kuhn

Bahar Tașkesen

Convex Optimization, Data Science Algorithms, Data Science Applications bandits, discrete choice models, online learning

Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as stochastic bandit problems and allow for a streamlined analysis. However, FTRL algorithms require the solution of an optimization problem in every iteration and are thus computationally challenging. In contrast, Follow-The-Perturbed-Leader (FTPL) algorithms achieve computational efficiency by perturbing the estimates of the rewards of … Read more

Distributionally Robust Optimization with Markovian Data

Published: 2021/06/11

Daniel Kuhn

Tobias Sutter

Mengmeng Li

Robust Optimization, Stochastic Programming distributionally robust optimization, frank-wolfe, large deviations, markov chain

We study a stochastic program where the probability distribution of the uncertain problem parameters is unknown and only indirectly observed via finitely many correlated samples generated by an unknown Markov chain with d states. We propose a data-driven distributionally robust optimization model to estimate the problem’s objective function and optimal solution. By leveraging results from … Read more