Fictitious Self-Play

Fictitious play is a popular game-theoretic model of learning in games. This paper aims at efficiently computing Nash equilibria of imperfect-information games.

In fictitious play, players repeatedly play a game, at each iteration choosing a best repsonse to their opponents' average strategies. The average strategy profile of fictitious players converges to a Nash equilibrium in certain classes of games.

Background

Extensive-Form

Extensive-form games are a model of sequential interaction involving multiple agents.

denotes the set of players.
is a set of states corresponding to nodes in a finite rooted game tree.
For each state node the edges to its successor states define a set of actions available to a player or chance in state .
The player function , with denoting chance, determines who is to act at a given state.
Chance is considered to be a particular player that follows a fixed randomized strategy that determins the distribution of chance events at chance nodes.
For each player there is a corresponding set of information states and an information function that determines which states are indistinguishable for the player by mapping them on the same information state .
We assume games with perfect recall
maps terminal states to a vector whose components correspond to each player's payoff.
A player's behavioural strategy, , determines a probability distribution over actions given an information state, and is the set of all behavioural strategies of player .
A strategy profile is a collection of strategies for all players.
refers to all strategies in except .
is the expected payoff of player if all players follow the strategy profile .
The set of best responses of player to their opponents' strategies is .
For defines the set of -best responses to the strategy profile .
A Nash equilibrium of an extensive-form game is a strategy profile such that .
An -Nash equilibrium is a strategy profile such that .

Normal-Form

An extensive-form game induces an equivalent normal-form game as follows.
For each player their deterministic strategies, , define a set of normal-form actions, called pure strategies.
A mixed strategy for player is a probability distribution over their pure strategies.
denote the set of all mixed strategies available to player .
A mixed strategy profile specifies a mixed strategy for each player
determines the expected payoff of player given a mixed strategy profile.

Realization-equivalence

The sequence-form of a game decomposes players' strategies into sequences of actions and probabilities of realizing these sequences.

For any player of a perfect-recall extensive-form game, each of their information states uniquely defines a sequence of actions
denote the set of such sequences of player .
denote the sequence that extends with action .

Definition 1. A realization plan of player is a function, , such that and .

A behavioural strategy induces a realization plan
A realization plan induces a behavioural strategy,

Definition 2. Two strategies and of a player are realization-equivalent if for any fixed strategy profile of the other players both strategies, and , define the same probability distribution over the states of the game.

Theorem 3 Two strategies are realization-equivalent if and only if they have the same realization plan.

Theorem 4 For a player with perfect recall, any mixed strategy is realization-equivalent to a behavioural strategy, and vice versa.

Fictitious Play

Definition 5. A generalised weakened fictitious play is a process of mixed strategies, , s.t.

with and as , a sequence of perturbations that satisfies

Reinforcement Learning

An agent is learning on-policy if it gathers these transition tuples by following its own policy. In the off-policy setting an agent is learning from experience of another agent or another policy.

Extensive-Form Fictitious Play

References

[1] @inproceedings{heinrich2015fictitious, title={Fictitious self-play in extensive-form games}, author={Heinrich, Johannes and Lanctot, Marc and Silver, David}, booktitle={International conference on machine learning}, pages={805--813}, year={2015}, organization={PMLR} }

[2]