Learning the Follower’s Objective Function in Sequential Bilevel Games

We consider bilevel optimization problems in which the leader has no or only partial knowledge about the objective function of the follower. The studied setting is a sequential one in which the bilevel game is played repeatedly. This allows the leader to learn the objective function (values) of the follower over time. We focus on two methods: a multiplicative weight update (MWU) method and one based on the lower-level’s KKT conditions that are used in the sense of inverse optimization. The MWU method requires less assumptions but the convergence guarantee is also only on the follower’s objective function values, whereas the inverse KKT method requires stronger assumptions but actually allows to learn objective functions that are consistent with the already observed interactions between the two players. Although the theory we present is only related to the lower-level and not to the upper-level problem, we show that the gained information are practically useful for the leader by illustrating that, over time, the leader’s objective function values tend to those that would be obtained under full information. The applicability of the proposed methods is shown using two case studies. First, we study a repeatedly played continuous knapsack interdiction problem and, second, a sequential bilevel pricing game in which the leader needs to learn the utility function of the follower. For both problems, we further illustrate the impact of this learning on the leader’s decisions.

Article

Download