This paper presents an approach to non-stationary policy search for finite-horizon, discrete-time Markovian decision problems with large state spaces, constrained action sets, and a risk-sensitive optimality criterion. The methodology relies on modeling time variant policy parameters by a non-parametric response surface model for an indirect parametrized policy motivated by the Bellman equation. Through the interpolating approximation, the level of non-stationarity of the policy and consequently the size of the resulting search problem can be adjusted. The computational tractability and the generality of the approach follow from a nested parallel implementation of a derivative-free optimization in conjunction with Monte Carlo simulation. We illustrate the efficiency of the approach by an optimal energy storage charging problem to minimize a risk functional of the cost. We observe that the achieved improvement from non-stationarity depends on the risk functional and is particularly significant for the Value-at-Risk.
Article
View Nonstationary Direct Policy Search for Risk-Averse Stochastic Optimization