Optimal Learning for Structured Bandits
We study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision- maker is aware of certain structural information regarding the reward distributions and would … Read more