• Tidak ada hasil yang ditemukan

Sparsifying Nash equilibria deterministically

In this section we give deterministic algorithms for “sparsifying” Nash equilibria (in the process turning them into-equilibria). In this way, a player with limited access to randomness, but who has access to an equilibrium mixed strategy, is able to produce a small strategy that can then be played.1

3.3.1 Two-players

Lipton et al. proved:

Theorem 3.3.1 (Lipton et al. [LMM03]) Let G = (R, C, n) be a two-player game, and let (p, q) be a Nash equilibrium for G. There is a polynomial-time ran- domized procedureP such that with probability at least 1/2, the pair(P(G, p), P(G, q)) is an O(logn/2)-sparse -equilibrium for G.

1The question of how the player may obtain an equilibrium mixed strategy is a separate and well-studied topic, but not the focus of this work.

The algorithmP is very simple: it amounts to sampling uniformly from the given equilibrium strategy. The analysis applies Chernoff bounds to show that the sam- pled strategies present the opposing players with approximately (within ) the same weighted row- and column- sums, and hence constitute an -equilibrium. In our set- ting, since the players have limited randomness they cannot afford the above sampling (it requires at least as much randomness as simply playing (p, q)), so we derandomize the algorithm using an expander walk.

Theorem 3.1.1 (restated). Let G = (R, C, n) be a two-player game, and let (p, q) be a Nash equilibrium for G. For every >0, there is a deterministic proce- dure P running in time poly(|G|)1/2 such that the pair (P(G, p,1), P(G, q,2)) is an O(logn/2)-sparse 4-equilibrium for G.

Before proving the theorem we will give a convenient characterization of-equilibrium:

Lemma 3.3.2 Let G= (R, C, n) be a 2-player game. Define Tp = {i|(pC)i ≥max

r (pC)r−} Sq = {j|(RqT)j ≥max

t (RqT)t−}.

If supp(p) ⊆ Sq and supp(q) ⊆ Tp, then (p, q) is an -approximate Nash equilibrium for G.

Proof.Consider an arbitrary p0 ∈∆([n]). Sincep0RqT is a convex combination of the (Rq)j, it is at most maxj(Rq)j. And, since p is a convex combination of the (Rq)j, with supp(p)⊆Sq, we havepRqT ≥maxj(Rq)j−. Thusp0RqT ≤pRqT+. Similarly, for an arbitrary q0 ∈∆([n]), we have pCq0T ≤maxi(pC)i, andpCqT ≥maxi(pC)i− since supp(q) ⊆ Tp. Thus pCq0T ≤ pCqT +. These two conditions guarantee that (p, q) is an -approximate Nash equilibrium.

We will use the Chernoff bound for random walks on an expander:

Theorem 3.3.3 (Gillman [Gil93]) LetH be an expander graph with second largest eigenvalue λ and vertex set V, and let f : V → [−1,1] be arbitrary with E[f] = µ.

Let X1, X2, . . . , Xt be the random variables induced by first picking X1 uniformly in V and X2, . . . , Xt by taking a random walk in H from X1. Then

Pr

"

1 t

X

i

f(Xi)−µ

> δ

#

< e−O((1−λ)δ2t).

Proof. [Proof of Theorem 3.1.1] When we are given G and p, we perform the fol- lowing steps.

First, construct a multiset S of [n] for which uniformly sampling from S approx- imates p to within /n. This can be done with |S| ≤ O(n/). Denote by ˜p the distribution induced by sampling uniformly from S. We identify S with the vertices of a constant-degree expanderH, and we can sampleS0 ⊆Sby taking a walk of length t = O(logn/2) steps in H. Note that this requires O(log|S|+O(t)) = O(logn/2) random bits. Let p0 be the probability distribution induced by sampling uniformly fromS0. By Theorem 3.3.3 (and using the fact thatC has entries in [−1,1]), for each fixedi,

Pr[|(p0C)i −(˜pC)i| ≥]≤e−O(2t)<1/n. (3.1) By a union bound |(p0C)i − (˜pC)i| ≤ for all i with non-zero probability. This condition can be checked given G, p, and so we can derandomize the procedure completely by trying all choices of the random bits used in the expander walk.

When we are given G and q, we perform essentially the same procedure (with respect to R and q), and in the end we output a pairp0, q0 for which:

|(p0C)i−(˜pC)i| ≤ ∀i

|(Rq0T)j −(Rq˜T)j| ≤ ∀j

We claim that any such (p0, q0) is an 4-equilibrium, assuming (p, q) are an equilib- rium. Using the fact that C, R have entries in [−1,1], and the fact that our multiset

approximations top, q have error at most/n in each coordinate, we obtain:

|(˜pC)i−(pC)i| ≤ ∀i

|(R˜qT)j −(Rq∗T)j| ≤ ∀j Define (as in Lemma 3.3.2):

Tp0 = {i|(p0C)i ≥max

i (p0C)i−4}

Sq0 = {j|(Rq0T)j ≥max

j (Rq0T)j−4}.

Now, w ∈ supp(p0) implies w ∈ supp(p) which implies (Rq∗T)w = maxj(Rq∗T)j (since (p, q) is a Nash equilibrium). From above we have that maxj(Rq0T)j ≤ maxj(Rq∗T)j+ 2and that (Rq0T)w ≥(Rq∗T)w−2. So (Rq0T)w ≥maxj(Rq0T)j−4, and hence w is in Sq0. We conclude that supp(p0) ⊆ Sq0. A symmetric argument shows that supp(q0) ⊆ Tp0. Applying Lemma 3.3.2, we conclude that (p0, q0) is a 4-equilibrium as required.

Since an equilibrium can be found efficiently by Linear Programming in the two player zero-sum case, we obtain as a corollary:

Corollary 3.1.2 (restated). Let G = (R, C, n) be a two-player zero-sum game.

For every >0, there is a deterministic procedure running in time poly(|G|)1/2 that outputs a pair (p, q) that an O(logn/2)-sparse -equilibrium for G.

3.3.2 Three or more players

We extend the algorithm above to make it work for games involving three or more players.

Theorem 3.3.4 LetG= (T1, T2, . . . , T`, n)be an`-player game, and let(p1, p2, . . . , p`) be a Nash equilibrium forG. For every >0, there is adeterministicprocedureP run- ning in time poly(|G|)1/2, such that the tuple(P(G, p1,1), P(G, p2,2), . . . , P(G, p`, `)) is an O((`logn)/2)-sparse 4-equilibrium for G.

Proof.The proof is almost identical to that of Theorem 3.1.1. When given (G, p1,1) P samples t = O(((` −1) logn + log`)/2) strategies from player 1’s multiset of strategies after identifying it with a constant-degree expander H and doing a t-step random walk on it. Let pe1 be the distribution obtained by sampling uniformly from the original multiset of strategies and pb1 the distribution induced by sampling from thet strategies picked from the random walk. For some fixing of (i2, . . . , i`) andj the Chernoff bound in (3.1) now becomes

Pr[|Tj(pb1, i2, . . . , i`)−Tj(pe1, i2, . . . , i`)| ≥]≤e−O(2t) <1/(`n`−1) By a union bound on all `n`−1 possible fixings for (i2, . . . , i`) and all j,

|Tj(pb1, i2, . . . , i`)−Tj(pe1, i2, . . . , i`)|<

with positive probability. As before, we can derandomize the procedure completely by trying all choices of the random bits used in the expander walk.

Essentially the same procedure gives us pb2,pb3, . . . ,pb`. We will first need a gener- alization of Lemma 3.3.2.

Lemma 3.3.5 Let G= (T1, . . . , T`, n) be an `-player game, and (p1, . . . , p`) an arbi- trary mixed strategy. Define, for each i,

Si ={j|Ti(p1, . . . , pi−1, j, pi+1, . . . , p`)≥max

r Ti(p1, . . . , pi−1, r, pi+1, . . . , p`)−} If supp(pi)⊆ Si for all i= 1, . . . , `, then (p1, . . . , p`) is an -approximate Nash equi- librium forG. In particular, (p1, . . . , p`) is a Nash equilibrium if it is a0-approximate Nash equilibrium for G.

Proof.As before, for some player Ti we consider an arbitrary p0 ∈∆([n]). Note that Ti(p1, . . . , pi−1, p0, pi+1, . . . , p`) is a convex combination of the n terms

Ti(p1, . . . , pi−1, j, pi+1, . . . , p`)

where j = 1, . . . , n. Therefore,

Ti(p1, . . . , pi−1, p0, pi+1, . . . , p`)≤max

r Ti(p1, . . . , pi−1, r, pi+1, . . . , p`) (3.2) pi is also a convex combination of the n terms above, with supp(pi)⊆Si, and so:

Ti(p1, . . . , pi−1, pi, pi+1, . . . , p`)≥max

r Ti(p1, . . . , pi−1, r, pi+1, . . . , p`)− (3.3) Combining (3.2) and (3.3), we have:

Ti(p1, . . . , pi−1, p0, pi+1, . . . , p`)≤Ti(p1, . . . , p`) +.

Applying an identical argument for all ` players, we have that (p1, . . . , p`) is an - approximate Nash equilibrium.

To show that (pb1, . . . ,pb`) constitute a 4-equilibrium, consider the set S1 = {i | T1(i,pb2, . . . ,pb`) ≥maxiT1(i,pb2, . . . ,pb`)−4}. Define Sj analogously with respect to Tj. Given Lemma 3.3.5, it suffices to show that supp(pbj) ⊆ Sj for all j. We sketch the argument for pb1; symmetric arguments hold with respect to pbj for all j.

By the same thread of reasoning as in the two-player case, for any w∈supp(pb1), T1(w,pb2, . . . ,pb`)≥T1(w, p2, . . . , p`)−2 and since supp(pb1)⊆supp(p1),

T1(w, p2, . . . , p`) = max

i T1(i, p2, . . . , pl)≥max

i T1(i,pb2, . . . ,pb`)−2.

Combining the two inequalities, we getT1(w,pb2, . . . ,pb`)≥maxiT1(i,pb2, . . . ,pb`)−4.