A Generalized Algorithm for Multi-ObjectiveReinforcement Learning and Policy Adaptation
Multi-Objective Reinforcement Learning (MORL)
Multi-objective RL deals with learning control policies to
simultaneously optimize over several criteria. In it, the optimal policy
depends on the relative preferences among competing
criteria. The MORL framework provides two distinct advanrages: - reduced
dependence on scalar reward design - dynamic adaption or transfer to
related tasks with different preferences In this paper, the author's
algorithm is based on two key insights - the optimality operator for a
generalized version of Bellman equation with preferences is a valid
contraction - optimizing for the convex envelope of multi-objective
Q-values ensure an efficient alignment between preferences and
corresponding optimal policies.
References
[1] Yang, Runzhe, Xingyuan Sun, and Karthik Narasimhan. "A generalized algorithm for multi-objective reinforcement learning and policy adaptation." Advances in Neural Information Processing Systems. 2019.