We study the problem of online learning (OL) from revealed preferences: a learner wishes to learn an agent's private utility function through observing the agent's utility-maximizing actions in a changing environment. We adopt an online inverse optimization setup, where the learner observes a stream of agent's actions in an online fashion and the learning performance is measured by regret associated with a loss function. Due to the inverse optimization component, attaining or proving convexity is difficult for all of the usual loss functions in the literature. We address this challenge by designing a new loss function that is convex under relatively mild assumptions. Moreover, we establish that the regret with respect to our new loss function also bounds the regret with respect to all other usual loss functions. This then allows us to design a flexible OL framework that enables a unified treatment of loss functions and supports a variety of online convex optimization algorithms. We demonstrate with theoretical and empirical evidence that our framework based on the new loss function (in particular online Mirror Descent) has significant advantages in terms of eliminating technical assumptions as well as regret performance and solution time over other OL algorithms from the literature.
Article
View Online Convex Optimization Perspective for Learning from Dynamically Revealed Preferences