We consider decision-making problems with contextual information, in which the reward function involves uncertain parameters that can be predicted using covariates. To quantify the uncertainty of the reward, we propose a new parameter uncertainty set based on a supervised learning oracle. We show that the worst/best-case reward over the proposed parameter uncertainty set serves as a confidence bound on the reward by sizing the uncertainty set properly. Based on these results, we develop performance guarantees for robust contextual optimization in the offline setting, and propose data-driven optimistic optimization as a systematic tool for online contextual decision-making with provable performance guarantees.
View Contextual Decision-making under Parametric Uncertainty and Data-driven Optimistic Optimization