We show that the Hedge algorithm, a method that is widely used in Machine Learning, can be interpreted as a particular instance of Dual Averaging schemes, which have recently been introduced by Nesterov for regret minimization. Based on this interpretation, we establish three alternative methods of the Hedge algorithm: one in the form of the original method, but with optimal parameters, one that requires less a priori information, and one that is better adapted to the context of the Hedge algorithm. All our modified methods have convergence results that are better or at least as good as the performance guarantees of the vanilla method. In numerical experiments, our methods significantly outperform the original scheme.
Technical Report, ETH Zurich, December 2011