Hongyuan Zha – Optimization Online

Reliable Off-policy Evaluation for Reinforcement Learning

Published: 2021/01/15

Rui Gao
Jie Wang
Hongyuan Zha

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy. Reinforcement learning in high-stake environments, such as healthcare and education, is often limited to off-policy settings due to safety or ethical concerns, or … Read more