Further Topics in Games

Published:

Further Topics in Games

Several problems show computational complexity classes, and lower bounds over several models, here is a brief description as done in Continuous Local Search by Daskalakis and Papadimitriou

First of all, some definitions about complexity ordered in terms of weakeness:

  • Query Complexity (Q): Complexity of an algorithm as the number of queries that it asks to a generative model of the game. The queries could be about either action profiles of all the agents (deterministic) or distributions over action profiles (randomized/stochastic).
  • Communication Complexity (C): Complexity of a (distributed) algorithm as the number of queries each (agent or party) does to the others
  • Sample Complexity (S): Complexity of an algorithm as the amount of samples needed (usually in a distributed fashion). In one-step MGs with deterministic reward, the query complexity is equivalent to the sample complexity, since each query obtains a reward entry

Problem Complexity:

PPADProblems that are hard to verify but for which a solution is known to exist, finding approximate fixed pointsComputing a Nash or an approximate Nash is PPAD-complete
PLSProblems that are polynomial in the local search of potential over a discrete solution spaceFinding a local Nash, finding a pure equilibrium is PLS-complete
CLS$\text{PPAD} \cap \text{PLS}$Local optimization in which the domain is continuous (SGD), finding fix-points of contraction maps, a mixed Nash in congestion games, and network coordination games
TFNPSearch problems that always have a solution 

Notes:

  • PPAD problems cannot be NP-complete, for the technical reason that NP is a class of decision problems, but the answer to PPAD problems is always yes, a solution is known to exist, even though it might be hard to find that solution. However, PPAD and NP are closely related. While the question of whether a Nash equilibrium exists for a given game cannot be NP-hard because the answer is always yes, the question of whether a second equilibrium exists is NP-complete
  • In potential games since the potential function is explicitly encoded in the players’ utility function, they can find its maximum without any communication.

Bounds for various equilibria and policies as in Comparative Studies on Sample Complexity Bounds in Multi-Agent Reinforcement Learning by Yang

epsilon NASHValue-BasedPolicy Gradient
Markov Game$\Omega(e^n)$ [Song et al 2022]$ *2$\Omega(e^n)$ [Song et al 2022]*2
2-player Games$\tilde O(\frac{H^3}{\epsilon^2} \vert S\vert\prod\vert A_i\vert )$ [Liu et al 2021]Polynomial sample complexity [Daskalakis et al 2020]
n-player games$\tilde O(\frac{H^4}{\epsilon^2} \vert S\vert ^2\prod\vert A_i\vert )$ [Liu et al 2021] 
n-player 2 actions$\Omega(2^n)$(Q)$\Omega(2^n)$(Q)
Markov Potential Game$2^{\Omega(\min({n, \log \frac{\Phi_{max}}{\epsilon}}))}$(C) [Babichenko 2020, Yang 2022]*1$2^{\Omega(\min({n, \log \frac{\Phi_{max}}{\epsilon}}))}$(C) [Babichenko 2020, Yang 2022] \*1
2-player n actions  
 $\tilde O(\Phi_{max}H^3(\frac{\vert S\vert }{\epsilon^2} + \frac{\vert S\vert ^2}{\epsilon^3} )\sum\vert A_i\vert )$ [Song et al 2022]$\tilde O(\frac{k^4n\Phi_{max}}{(1-\gamma)^6\epsilon^2} \max \vert A_i\vert )$ [Ding et al 2022]
  $\tilde O(\frac{\gamma D^2n\Phi_{max}\vert S\vert }{(1-\gamma)^5\epsilon^2} \max\vert A_i\vert )$ [Leonardos et al 2022]
Cooperative Games $\tilde O(\frac{k^n}{(1-\gamma)^4\epsilon^2} \max\vert A_i\vert )$ [Ding et al 2022]

*1 The bound is about query and communication complexity says that for any algorithm (randomized or deterministic), there exists an instance of finding a mixed Nash equilibrium in potential games using noiseless pure query with 2(n+1) players and such that it needs at least that queries to solve the problem with constant probability. The same complexity goes for approximated well supported Nash.

*2 How long to equilibrium? The communication complexity of uncoupled equilibrium procedures by Hart and Mansour shows that the communication complexity of finding an approximate Nash in binary action n-agent games with decoupled dinamycs (where just the agent wise payoff is known) is exponential in the number of agents $\Omega(2^n)$, and the same goes for exact potential gams with an exponential lower bound for the worst case running time of any algorithm that finds a pure Nash equilibrium for all potential games is $\Omega(2^n/\sqrt n)$. This result is actually similar to the $\Omega(2^n)$ query complexity lower bound for the same game in Query Complexity of Approximate Nash Equilibria by Babichenko and communication complexity for the same game with two parties observing and communicating for $n/2$ agents.

epsilon CCEstationary Markovnon stationary Markovnon Markov
Value-BasedPPAD-hard [Daskalakis et al 2022]$\tilde O(\frac{H^4}{\epsilon^2} \vert S\vert ^2\prod\vert A_i\vert )$ [Liu et al 2021]$\tilde O(\frac{H^5}{\epsilon^2} \vert S\vert \max\vert A_i\vert )$ [Song et al 2022]
  $\tilde O(\frac{H^10}{\epsilon^3} \vert S\vert ^3\max\vert A_i\vert )$ [Daskalakis et al 2022] 
PGPPAD-hard [Daskalakis et al 2022]N/AN/A
epsilon CEnon stationary Markovnon Markov
 $\Omega(\text{poly}(n))$ [Babichenko 2018]*$\Omega(\text{poly}(n))$ [Babichenko 2018]*
Value-Based$\tilde O(\frac{H^4}{\epsilon^2} \vert S\vert ^2\prod\vert A_i\vert )$ [Liu et al 2021]$\tilde O(\frac{H^6}{\epsilon^2} \vert S\vert \max\vert A_i\vert ^2)$ [Song et al 2022]
PGN/AN/A

*There exists a randomized algorithm for computing an approximate correlated equilibrium using only poly(n) payoff queries. Such a result is achieved by regret minimizing algorithms (for instance the regret matching algorithm). The result is surprising because after poly(n) queries the algorithm knows only a tiny fraction of payoffs in the game $\frac{\text{poly}(n)}{\prod{\vert A_i\vert }}$ .Nevertheless, the algorithm knows (with high probability) that a certain distribution forms an approximate correlated equilibrium.

Inefficiency of equilibria and bounds

The price of anarchy, the most popular measure of the inefficiency of equilibria, resolves the issue of multiple equilibria by adopting a worst-case approach. Precisely, the price of anarchy of a game is defined as the ratio between the worst objective function value of an equilibrium of the game and that of an optimal outcome

\[\text{POA} = \frac{\max_{\pi \in \text{NE(G)}}V^\pi_{\text{TOT}}(s_0)}{V^*(s_0)}\]

We might be interested in identifying games in which the price of anarchy is close to 1 for some, in these games, all equilibria are good approximations of an optimal outcome. We view selfish behavior as benign in such games. Put differently, the benefit provided by (possibly costly or infeasible) dictatorial control over the players’ actions is reasonably small. A game with multiple equilibria has a large price of anarchy even if only one of its equilibria is highly inefficient.

The price of stability is a measure of inefficiency designed to differentiate between games in which all equilibria are inefficient and those in which some equilibrium is inefficient. Formally, the price of stability of a game is the ratio between the best objective function value of one of its equilibria and that of an optimal outcome.

\[\text{POS} = \frac{\min_{\pi \in \text{NE(G)}}V^\pi_{\text{TOT}}(s_0)}{V^*(s_0)}\]

Finally, the price of total anarchy is a generalization defined as

\[\text{POA} = \frac{\max_{\pi \in \text{CCE(G)}}V^\pi_{\text{TOT}}(s_0)}{V^*(s_0)}\]

To obtain bounds, theoretical results on smooth games are used, summarized as a class of games where “the externality imposed on any one player by the others is bounded” (Roughgarden, 2015), namely

\[V^{\pi_i', \pi_{-i}} \leq \lambda V^{\pi'} + \mu V^{\pi} \quad \forall \; i, \pi, \pi'\]

Concurrently, two other notions of NE might be relevant:

  • Well-supported Nash Equilibrium: A policy is an $\epsilon$-well-supported Nash Equilibrium if and only if for any agent
\[\tilde a_i \in \text{supp}(\pi_i) \Rightarrow \max_{a' \in A} \mathbb{E}_{\pi_{-i}}[Q_i^\pi(a',a_{-i})] - Q_i^\pi(\tilde a, a_{-i}) < \epsilon\]
  • Strong Nash Equilibrium: A policy is a strong Nash if there are no beneficial deviations for any subset of agents.