Bandit Algorithms

Decison-making with uncertainty is a challenge we all face, and bandits provide a simple model of this dilemma. Finding the right balance between exploration and exploitation is at the heart of all bandit problems.

The Language of Bandits

A bandit problem is a sequential game between a learner and an environment. The game is played over rounds, where is a positive natural number called the horizon. In each round , the learner first chooses an action from a given set , and the environment then reveals a reward .

References

[1] @book{lattimore2020bandit, title={Bandit algorithms}, author={Lattimore, Tor and Szepesv{'a}ri, Csaba}, year={2020}, publisher={Cambridge University Press} }