A Generalized Algorithm for Multi-ObjectiveReinforcement Learning and Policy Adaptation

Multi-Objective Reinforcement Learning (MORL)

Multi-objective RL deals with learning control policies to simultaneously optimize over several criteria. In it, the optimal policy depends on the relative preferences among competing criteria. The MORL framework provides two distinct advanrages: - reduced dependence on scalar reward design - dynamic adaption or transfer to related tasks with different preferences In this paper, the author's algorithm is based on two key insights - the optimality operator for a generalized version of Bellman equation with preferences is a valid contraction - optimizing for the convex envelope of multi-objective Q-values ensure an efficient alignment between preferences and corresponding optimal policies.

References

[1] Yang, Runzhe, Xingyuan Sun, and Karthik Narasimhan. "A generalized algorithm for multi-objective reinforcement learning and policy adaptation." Advances in Neural Information Processing Systems. 2019.