Regularized MDP and IRL
Many recent successful deep reinforcement learning algorithms make use of regularization, generally based on entropy or KL divergence. The paper [1] propose a general theory of regularized MDP that generalizes these approaches in two directions: consider a larger class of regularizers, and consider the general modified policy iteration approach. [3] propose a general framework for entropy-regularized average-reward reinforcement learning in MDPs. [2] adopts the regularized theory into inverse reinforcement learning.
Entropy-Regularized MDPs
The optimal behavior policy can be formulated as the solution of the Bellman optimality equations. In unknown environments with partially known or misspecified models, greeedily solving these equations often results in policies that far from optimal in the true environment. The notion regularization offers a principled way of dealing with this issue; in particular, entropy regularization has proven to be one of the most successful tools.
Regularized MDP
References
[1] @inproceedings{geist2019theory, title={A theory of regularized markov decision processes}, author={Geist, Matthieu and Scherrer, Bruno and Pietquin, Olivier}, booktitle={International Conference on Machine Learning}, pages={2160--2169}, year={2019}, organization={PMLR} }
[2] @article{jeon2020regularized, title={Regularized Inverse Reinforcement Learning}, author={Jeon, Wonseok and Su, Chen-Yang and Barde, Paul and Doan, Thang and Nowrouzezahrai, Derek and Pineau, Joelle}, journal={arXiv preprint arXiv:2010.03691}, year={2020} }
[3] @article{neu2017unified, title={A unified view of entropy-regularized markov decision processes}, author={Neu, Gergely and Jonsson, Anders and G{'o}mez, Vicen{}}, journal={arXiv preprint arXiv:1705.07798}, year={2017} }