Fictitious Self-Play
Fictitious play is a popular game-theoretic model of learning in games. This paper aims at efficiently computing Nash equilibria of imperfect-information games.
In fictitious play, players repeatedly play a game, at each iteration choosing a best repsonse to their opponents' average strategies. The average strategy profile of fictitious players converges to a Nash equilibrium in certain classes of games.
Background
Extensive-Form
Extensive-form games are a model of sequential interaction involving multiple agents.
denotes the set of players. is a set of states corresponding to nodes in a finite rooted game tree. - For each state node
the edges to its successor states define a set of actions available to a player or chance in state . - The player function
, with denoting chance, determines who is to act at a given state. - Chance is considered to be a particular player that follows a fixed randomized strategy that determins the distribution of chance events at chance nodes.
- For each player
there is a corresponding set of information states and an information function that determines which states are indistinguishable for the player by mapping them on the same information state . - We assume games with perfect recall
maps terminal states to a vector whose components correspond to each player's payoff. - A player's behavioural strategy,
, determines a probability distribution over actions given an information state, and is the set of all behavioural strategies of player . - A strategy profile
is a collection of strategies for all players. refers to all strategies in except . is the expected payoff of player if all players follow the strategy profile . - The set of best responses of player
to their opponents' strategies is . - For
defines the set of -best responses to the strategy profile . - A Nash equilibrium of an extensive-form game is a
strategy profile
such that . - An
-Nash equilibrium is a strategy profile such that .
Normal-Form
- An extensive-form game induces an equivalent normal-form game as follows.
- For each player
their deterministic strategies, , define a set of normal-form actions, called pure strategies. - A mixed strategy
for player is a probability distribution over their pure strategies. denote the set of all mixed strategies available to player . - A mixed strategy profile
specifies a mixed strategy for each player determines the expected payoff of player given a mixed strategy profile.
Realization-equivalence
The sequence-form of a game decomposes players' strategies into sequences of actions and probabilities of realizing these sequences.
- For any player
of a perfect-recall extensive-form game, each of their information states uniquely defines a sequence of actions denote the set of such sequences of player . denote the sequence that extends with action .
Definition 1. A realization plan of
player
- A behavioural strategy
induces a realization plan - A realization plan induces a behavioural strategy,
Definition 2. Two strategies
Theorem 3 Two strategies are realization-equivalent if and only if they have the same realization plan.
Theorem 4 For a player with perfect recall, any mixed strategy is realization-equivalent to a behavioural strategy, and vice versa.
Fictitious Play
Definition 5. A generalised weakened
fictitious play is a process of mixed strategies,
with
Reinforcement Learning
An agent is learning on-policy if it gathers these transition tuples by following its own policy. In the off-policy setting an agent is learning from experience of another agent or another policy.
Extensive-Form Fictitious Play
References
[1] @inproceedings{heinrich2015fictitious, title={Fictitious self-play in extensive-form games}, author={Heinrich, Johannes and Lanctot, Marc and Silver, David}, booktitle={International conference on machine learning}, pages={805--813}, year={2015}, organization={PMLR} }