Sparsifying Nash equilibria deterministically

In this section we give deterministic algorithms for “sparsifying” Nash equilibria (in the process turning them into-equilibria). In this way, a player with limited access to randomness, but who has access to an equilibrium mixed strategy, is able to produce a small strategy that can then be played.¹

3.3.1 Two-players

Lipton et al. proved:

Theorem 3.3.1 (Lipton et al. [LMM03]) Let G = (R, C, n) be a two-player game, and let (p^∗, q^∗) be a Nash equilibrium for G. There is a polynomial-time ran- domized procedureP such that with probability at least 1/2, the pair(P(G, p^∗), P(G, q^∗)) is an O(logn/²)-sparse -equilibrium for G.

1The question of how the player may obtain an equilibrium mixed strategy is a separate and well-studied topic, but not the focus of this work.

The algorithmP is very simple: it amounts to sampling uniformly from the given equilibrium strategy. The analysis applies Chernoff bounds to show that the sam- pled strategies present the opposing players with approximately (within ) the same weighted row- and column- sums, and hence constitute an -equilibrium. In our set- ting, since the players have limited randomness they cannot afford the above sampling (it requires at least as much randomness as simply playing (p^∗, q^∗)), so we derandomize the algorithm using an expander walk.

Theorem 3.1.1 (restated). Let G = (R, C, n) be a two-player game, and let (p^∗, q^∗) be a Nash equilibrium for G. For every >0, there is a deterministic procedure P running in time poly(|G|)^1/² such that the pair (P(G, p^∗,1), P(G, q^∗,2)) is an O(logn/²)-sparse 4-equilibrium for G.

Before proving the theorem we will give a convenient characterization of-equilibrium:

Lemma 3.3.2 Let G= (R, C, n) be a 2-player game. Define T_p = {i|(pC)_i ≥max

r (pC)_r−} S_q = {j|(Rq^T)_j ≥max

t (Rq^T)_t−}.

If supp(p) ⊆ S_q and supp(q) ⊆ T_p, then (p, q) is an -approximate Nash equilibrium for G.

Proof.Consider an arbitrary p⁰ ∈∆([n]). Sincep⁰Rq^T is a convex combination of the (Rq)_j, it is at most max_j(Rq)_j. And, since p is a convex combination of the (Rq)_j, with supp(p)⊆S_q, we havepRq^T ≥max_j(Rq)_j−. Thusp⁰Rq^T ≤pRq^T+. Similarly, for an arbitrary q⁰ ∈∆([n]), we have pCq⁰^T ≤max_i(pC)_i, andpCq^T ≥max_i(pC)_i− since supp(q) ⊆ T_p. Thus pCq⁰^T ≤ pCq^T +. These two conditions guarantee that (p, q) is an -approximate Nash equilibrium.

We will use the Chernoff bound for random walks on an expander:

Theorem 3.3.3 (Gillman [Gil93]) LetH be an expander graph with second largest eigenvalue λ and vertex set V, and let f : V → [−1,1] be arbitrary with E[f] = µ.

Let X₁, X₂, . . . , X_t be the random variables induced by first picking X₁ uniformly in V and X₂, . . . , X_t by taking a random walk in H from X₁. Then

1 t

f(X_i)−µ

> δ

< e^{−O((1−λ)δ}²^t).

Proof. [Proof of Theorem 3.1.1] When we are given G and p^∗, we perform the fol- lowing steps.

First, construct a multiset S of [n] for which uniformly sampling from S approx- imates p^∗ to within /n. This can be done with |S| ≤ O(n/). Denote by ˜p the distribution induced by sampling uniformly from S. We identify S with the vertices of a constant-degree expanderH, and we can sampleS⁰ ⊆Sby taking a walk of length t = O(logn/²) steps in H. Note that this requires O(log|S|+O(t)) = O(logn/²) random bits. Let p⁰ be the probability distribution induced by sampling uniformly fromS⁰. By Theorem 3.3.3 (and using the fact thatC has entries in [−1,1]), for each fixedi,

Pr[|(p⁰C)_i −(˜pC)_i| ≥]≤e^−O(²^t)<1/n. (3.1) By a union bound |(p⁰C)_i − (˜pC)_i| ≤ for all i with non-zero probability. This condition can be checked given G, p^∗, and so we can derandomize the procedure completely by trying all choices of the random bits used in the expander walk.

When we are given G and q^∗, we perform essentially the same procedure (with respect to R and q^∗), and in the end we output a pairp⁰, q⁰ for which:

|(p⁰C)i−(˜pC)i| ≤ ∀i

|(Rq⁰^T)j −(Rq˜^T)j| ≤ ∀j

We claim that any such (p⁰, q⁰) is an 4-equilibrium, assuming (p^∗, q^∗) are an equilibrium. Using the fact that C, R have entries in [−1,1], and the fact that our multiset

approximations top^∗, q^∗ have error at most/n in each coordinate, we obtain:

|(˜pC)i−(p^∗C)i| ≤ ∀i

|(R˜q^T)j −(Rq^∗T)j| ≤ ∀j Define (as in Lemma 3.3.2):

T_p⁰ = {i|(p⁰C)_i ≥max

i (p⁰C)_i−4}

S_q⁰ = {j|(Rq⁰^T)_j ≥max

j (Rq⁰^T)_j−4}.

Now, w ∈ supp(p⁰) implies w ∈ supp(p^∗) which implies (Rq^∗T)_w = max_j(Rq^∗T)_j (since (p^∗, q^∗) is a Nash equilibrium). From above we have that max_j(Rq^0T)_j ≤ max_j(Rq^∗T)_j+ 2and that (Rq^0T)_w ≥(Rq^∗T)_w−2. So (Rq^0T)_w ≥max_j(Rq^0T)_j−4, and hence w is in S_q⁰. We conclude that supp(p⁰) ⊆ S_q⁰. A symmetric argument shows that supp(q⁰) ⊆ T_p⁰. Applying Lemma 3.3.2, we conclude that (p⁰, q⁰) is a 4-equilibrium as required.

Since an equilibrium can be found efficiently by Linear Programming in the two player zero-sum case, we obtain as a corollary:

Corollary 3.1.2 (restated). Let G = (R, C, n) be a two-player zero-sum game.

For every >0, there is a deterministic procedure running in time poly(|G|)^1/² that outputs a pair (p^∗, q^∗) that an O(logn/²)-sparse -equilibrium for G.

3.3.2 Three or more players

We extend the algorithm above to make it work for games involving three or more players.

Theorem 3.3.4 LetG= (T₁, T₂, . . . , T_`, n)be an`-player game, and let(p^∗₁, p^∗₂, . . . , p^∗_`) be a Nash equilibrium forG. For every >0, there is adeterministicprocedureP running in time poly(|G|)^1/², such that the tuple(P(G, p^∗₁,1), P(G, p^∗₂,2), . . . , P(G, p^∗_`, `)) is an O((`logn)/²)-sparse 4-equilibrium for G.

Proof.The proof is almost identical to that of Theorem 3.1.1. When given (G, p^∗₁,1) P samples t = O(((` −1) logn + log`)/²) strategies from player 1’s multiset of strategies after identifying it with a constant-degree expander H and doing a t-step random walk on it. Let pe₁ be the distribution obtained by sampling uniformly from the original multiset of strategies and pb₁ the distribution induced by sampling from thet strategies picked from the random walk. For some fixing of (i₂, . . . , i_`) andj the Chernoff bound in (3.1) now becomes

Pr[|Tj(pb1, i2, . . . , i`)−Tj(pe1, i2, . . . , i`)| ≥]≤e^−O(²^t) <1/(`n^`−1) By a union bound on all `n^`−1 possible fixings for (i₂, . . . , i_`) and all j,

|T_j(pb₁, i₂, . . . , i_`)−T_j(pe₁, i₂, . . . , i_`)|<

with positive probability. As before, we can derandomize the procedure completely by trying all choices of the random bits used in the expander walk.

Essentially the same procedure gives us pb₂,pb₃, . . . ,pb_`. We will first need a gener- alization of Lemma 3.3.2.

Lemma 3.3.5 Let G= (T₁, . . . , T_`, n) be an `-player game, and (p₁, . . . , p_`) an arbitrary mixed strategy. Define, for each i,

S_i ={j|T_i(p₁, . . . , pi−1, j, p_i+1, . . . , p_`)≥max

r T_i(p₁, . . . , pi−1, r, p_i+1, . . . , p_`)−} If supp(pi)⊆ Si for all i= 1, . . . , `, then (p1, . . . , p`) is an -approximate Nash equilibrium forG. In particular, (p₁, . . . , p_`) is a Nash equilibrium if it is a0-approximate Nash equilibrium for G.

Proof.As before, for some player T_i we consider an arbitrary p⁰ ∈∆([n]). Note that Ti(p1, . . . , pi−1, p⁰, pi+1, . . . , p`) is a convex combination of the n terms

T_i(p₁, . . . , pi−1, j, p_i+1, . . . , p_`)

where j = 1, . . . , n. Therefore,

Ti(p1, . . . , pi−1, p⁰, pi+1, . . . , p`)≤max

r Ti(p1, . . . , pi−1, r, pi+1, . . . , p`) (3.2) p_i is also a convex combination of the n terms above, with supp(p_i)⊆S_i, and so:

T_i(p₁, . . . , p_i−1, p_i, p_i+1, . . . , p_`)≥max

r T_i(p₁, . . . , p_i−1, r, p_i+1, . . . , p_`)− (3.3) Combining (3.2) and (3.3), we have:

T_i(p₁, . . . , pi−1, p⁰, p_i+1, . . . , p_`)≤T_i(p₁, . . . , p_`) +.

Applying an identical argument for all ` players, we have that (p1, . . . , p`) is an - approximate Nash equilibrium.

To show that (pb1, . . . ,pb`) constitute a 4-equilibrium, consider the set S1 = {i | T₁(i,pb₂, . . . ,pb_`) ≥max_iT₁(i,pb₂, . . . ,pb_`)−4}. Define S_j analogously with respect to Tj. Given Lemma 3.3.5, it suffices to show that supp(pbj) ⊆ Sj for all j. We sketch the argument for pb₁; symmetric arguments hold with respect to pb_j for all j.

By the same thread of reasoning as in the two-player case, for any w∈supp(pb1), T₁(w,pb₂, . . . ,pb_`)≥T₁(w, p^∗₂, . . . , p^∗_`)−2 and since supp(pb₁)⊆supp(p^∗₁),

T₁(w, p^∗₂, . . . , p^∗_`) = max

i T₁(i, p^∗₂, . . . , p^∗_l)≥max

i T₁(i,pb₂, . . . ,pb_`)−2.

Combining the two inequalities, we getT₁(w,pb₂, . . . ,pb_`)≥max_iT₁(i,pb₂, . . . ,pb_`)−4.

Dalam dokumen Limited Randomness in Games, and Computational Perspectives in Revealed Preference (Halaman 53-58)