getdocfe5e. 385KB Jun 04 2011 12:05:15 AM

(1)

El e c t ro n ic

Jo ur n

a l o

f P

r

o b_{a b i l i t y}

Vol. 12 (2007), Paper no. 10, pages 262–299. Journal URL

http://www.math.washington.edu/~ejpecp/

Robust Mixing

Murali K. Ganapathy Google Inc.∗

Abstract

In this paper, we develop a new “robust mixing” framework for reasoning about adversarially modified Markov Chains (AMMC). LetPbe the transition matrix of an irreducible Markov Chain with stationary distributionπ. An adversary announces a sequence of stochastic ma-trices{At_}t>0 satisfyingπAt=π. An AMMC process involves an application ofP followed byAt at time t. The robust mixing time of an ergodic Markov Chain P is the supremum over all adversarial strategies of the mixing time of the corresponding AMMC process. Ap-plications include estimating the mixing times for certain non-Markovian processes and for reversible liftings of Markov Chains.

Non-Markovian card shuffling processes: The random-to-cyclic transposition process is a non-Markovian card shuffling process, which at time t, exchanges the card at position Lt := t (modn) with a random card. Mossel, Peres and Sinclair (2004) showed a lower

bound of (0.0345 +o(1))nlogn for the mixing time of the random-to-cyclic transposition process. They also considered a generalization of this process where the choice ofLtis

ad-versarial, and proved an upper bound ofCnlogn+O(n) (withC≈4×105

) on the mixing time. We reduce the constant to 1 by showing that the random-to-top transposition chain (a Markov Chain) has robust mixing time≤nlogn+O(n) when the adversarial strategies are limited to holomorphic strategies, i.e. those strategies which preserve the symmetry of the underlying Markov Chain. We also show aO(nlog2n) bound on the robust mixing time of the lazy random-to-top transposition chain when the adversary is not limited to holomorphic strategies.

Reversible liftings: Chen, Lov´asz and Pak showed that for a reversible ergodic Markov ChainP, any reversible liftingQ ofP must satisfy T(P)≤ T(Q) log(1/π∗) whereπ∗ is the

minimum stationary probability. Looking at a specific adversarial strategy allows us to show that T(Q)≥r(P) where r(P) is the relaxation time of P. This gives an alternate proof of ∗

(2)

the reversible lifting result and helps identify cases where reversible liftings cannot improve the mixing time by more than a constant factor.

Key words: Markov Chains, Robust mixing time, Reversible lifting, random-to-cyclic trans-position, non-Markovian processes.

AMS 2000 Subject Classification: Primary 60J10, 62M09, 62M15.

(3)

1 Introduction

In this paper, we develop a “robust mixing” framework which allows us to reason about adver-sarially modified Markov Chains (AMMC). This framework can be used to bound mixing times of somenon-Markovian processes in terms of the robust mixing time of related Markov Chains. Another type of application is to estimate mixing times of complex Markov Chains in terms of that of simpler Markov Chains. Finally, we also use this framework to give an alternate proof of a reversible lifting result due to Chen et al. (4).

1.1 Robust Mixing

All stochastic processes considered in this paper have finite state space and run in discrete time. Let _M be an irreducible Markov chain on state space _X with transition probability matrix P and stationary distributionπ. When the context is clear, we usePto denote the Markov Chain as well as its transition probability matrix.

Markov Chains we consider here are not assumed to be reversible, unless otherwise specified. All logarithms are taken to the base e unless otherwise specified.

Definition 1.1. Let P be the transition matrix of a Markov Chain. Its mixing time and L2

-mixing time are defined by the equations

T(P, ǫ) = max

µ mint {||µP t

−π_||T V ≤ǫ} and T2(P, ǫ) = max

µ mint {||µP t

−π_||2,π≤ǫ}

respectively. Here_||µ₋π_||T V =Px|µ(x)−π(x)|/2 is the total variation norm and||µ−π||22,π=

P

x(µ(x)−π(x))2/π(x) is theL2(π) norm. When ǫis not specified, we take it to be 1/4 forT and 1/2 for _T2.

Note that for ǫ < 1/2, the inequalities 2_{|| · ||}T V ≤ || · ||2,π ≤ 2|| · ||T V

q

1

π∗ together with submultiplicativity of_T(ǫ/2) give

T(P, ǫ)_{≤ T}2(P,2ǫ) ≤ T(P, ǫ) log1/2ǫ(1/π∗), (1)

whereπ_∗= minxπ(x) is the minimum stationary probability.

Definition 1.2. Let P be an irreducible Markov Chain with stationary distribution π. A stochastic matrix A(not necessarily irreducible) is said to becompatible withP ifπA=π.

Note that the notion of compatibility depends only on the stationary distribution ofP.

Definition 1.3. An adversarially modified Markov Chain (AMMC) _P is a pair (P,_{A_t_}_t>₀), wherePis the transition matrix of an irreducible Markov Chain andA_tis a sequence of stochastic matrices compatible withP. Given an AMMC and an initial distributionµ0, the AMMC process

evolves as follows:

• At timet= 0, pick X0 ∈ X according to µ0,

(4)

• GivenYt, pickXt+1 according to the distributionAt(Yt,·)

An application of P followed by A_t is called a round. Let µt and νt denote the distribution of

Xt and Ytrespectively. Then µt is the distribution aftert-rounds. Also

νt=µtP and µt+1 =νtAt (2)

Definition 1.4. Let _P be an AMMC. Its mixing time and L2-mixing time are defined by the

equations

T(_P, ǫ) = max

µ0 mint {||µt−π||T V ≤ǫ} and T2(P, ǫ) = maxµ0 mint {||µt−π||2,π ≤ǫ} respectively. When ǫis not specified, we take it to be 1/4 for _T and 1/2 for _T2.

The proof of (1) together with submultiplicativity of_T(_P) (Theorem 1.8) shows that forǫ <1/2,

T(_P, ǫ)_{≤ T}2(P,2ǫ)≤ T(P, ǫ) log1

2ǫ(1/π∗) (3)

Definition 1.5. LetPbe an irreducible Markov Chain. Anadversarially modified version ofP is an AMMC (P,_{A_t_}_t>₀).

Definition 1.6. Let P be an ergodic Markov Chain. The robust mixing time and robust L2

-mixing time of Pare defined by the equations

R(P, ǫ) = sup P T

(_P, ǫ) and R2(P, ǫ) = sup

P T2 (_P, ǫ)

respectively, where the suprema are taken over adversarially modified versions_P of P. WhenP is clear from context, we drop it and whenǫis not specified we take it to be 1/4 for R and 1/2 forR2.

The set of stochastic matrices compatible with P is a bounded polytope and hence the convex hull of its vertices. Since the distances used to measure the mixing time are convex, it follows that the worst case for robust mixing time is achieved when eachA_tis a vertex of the polytope. If the stationary distribution is uniform, the polytope is called theassignment polytope and its vertices are permutation matrices.

When we need to distinguish between the standard notion of mixing time and robust mixing time, we refer to the standard notion as “standard mixing time.”

One can think of the standard mixing time of a Markov Chain as the number of (contiguous) applications of P required to get close to stationarity. In the same vein, the robust mixing time is the number of not necessarily contiguous applications of P required to get close to stationarity under reasonable assumptions on the intervening steps. Our adversary isoblivious. See Section 1.7 for related discussion. If π is the uniform distribution then compatibility is equivalent to requiring that theA_t are doubly stochastic.

Definition 1.7. Let_R=_{Rt}t>0 be a stochastic process on a state spaceX andPan irreducible

Markov Chain. We say that _R can be simulated by an adversarially modified P, if there is an adversarially modified version _P of P such that for every initial distribution ν0 of R0, there is

an initial distributionµ0 of P such that the distribution νtof Rtand µt are equal.

(5)

1.2 Properties of robust mixing time

Like standard mixing time, robust mixing time is also submultiplicative.

Theorem 1.8. (Submultiplicativity) Let P be an ergodic Markov Chain. Forǫ, δ >0,

R(P, ǫδ/2)_≤R(P, ǫ/2) +R(P, δ/2) and R2(P, ǫδ)≤R2(P, ǫ) +R2(P, δ)

This will be proved in Section 2.1. A useful property enjoyed by robust mixing time not shared by the standard mixing time is the following convexity property.

Theorem 1.9. (Convexity) Let Pbe an ergodic Markov Chain with stationary distribution π

and Qany Markov Chain compatible with P. Let 0< a= 1₋b <1 and π_∗ = minxπ(x). Then

R(PQ,1/4)_≤R(P,1/4) and R2(PQ,1/2) ≤R2P,1/2). Also,

R(aP+bQ,1/4)_≤R(P,1/4) +R(Q,1/4)₋1 R2(aP+bQ,1/2)≤R2(P,1/2) +R2(Q,1/2)−1

Moreover,

• if R(P,1/4)_≥11, then R(aP+bQ,1/4)_≤3R(P,1/4)/a

• If π_∗ _≤1/16 and R2(P,1/2)≥log(1/π∗)/2 then R2(aP+bQ)≤7R(P)/a

Theorem 1.9 is proved in Section 2.3. Convex combinations of Markov Chains are considered in (3) to sample linear orderings. For reversible chains P and Q, using standard results, one can show that a convex combination ofP and Qmixes in timeO(min(_T(P),_T(Q)) log(1/π_∗)). Our result allows us to eliminate the log(1/π_∗) factor under some conditions.

1.3 Relation to classical parameters of Markov Chains

We now relate the robust mixing time of Markov chains to classical mixing parameters.

Definition 1.10. LetPbe an ergodic chain with stationary distributionπ.

• Denote byπ_∗ the smallest entry of π, i.e. π_∗ = minxπ(x),

• Let Π denote a diagonal matrix with entriesπ, i.e. Π(x, x) =π(x),

• S(P) =√Π−1_P√_Π,

• ←−P = Π−1_PT_{Π denotes the reverse of the Markov Chain}_P_{, where}_PT _{denotes the transpose} of the matrixP.

Definition 1.11. LetAbe anyN_×N real matrix. By asingular value decomposition ofA, we mean two orthonormal bases_{x0, . . . ,xN−1} and {y0, . . . ,yN−1} and scalars σ0 ≥σ1 ≥ · · · ≥

σN−1 ≥0 which satisfy

xiA=σiyi yiAT =σ_ix_i (4)

The σi are called the singular values and the xi and yi are called the left and right singular

(6)

See Horn and Johnson (14, Chapter 3) for results about singular values. If Ais the transition matrix of a reversible chain or a real symmetric matrix, let λ0(A) ≥ λ1(A) ≥ · · · ≥ λN−1(A)

denote its eigenvalues and putλ_∗(A) = max(_|λ1(A)|,|λN−1(A)|).

Definition 1.12. LetP be an ergodic reversible Markov Chain. Its relaxation time is defined by

r(P) = −1 logλ_∗(P)

Note that for any ergodic reversible Markov ChainP,r(P)_{≤ T}(P)_≤r(P)(log(1/π_∗)/2 + 1). Just like how the mixing time of a reversible chain is determined by the largest eigenvalue (in modulus) up to a log(1/π_∗) factor, the robust mixing time of a Markov Chain (not necessarily reversible) is determined by its second largest singular value of S(P) up to a log(1/π_∗) factor. More specifically, we have

Theorem 1.13. Let P be an ergodic Markov Chain with stationary distribution π. Then

r(P←−P)_≤max(_T(P←−P),_T(P))_≤R(P)_≤2r(P←−P)(log(1/π_∗)/2 + 1)

In particular if P is reversible, we have

r(P)_{≤ T}(P)_≤R(P)_≤r(P)

log(1/π_∗) 2 + 1

In Section 4, we show that many of the techniques used to prove upper bounds on mixing time actually give bounds on the robust mixing time. These include eigenvalue estimation, conductance methods, log-Sobolev inequalities and most analytical methods. The most notable exception is coupling. Mixing time bounds established via coupling do not automatically lead to bounds on robust mixing time. However in certain cases, they lead to bounds on robust mixing time against certain types of restricted adversaries.

1.4 Cayley walks with restricted adversaries

We now turn to Markov Chains induced by walks on groups.

Definition 1.14. LetGbe a finite group and P a probability distribution over G. By a Cayley walk on G induced by P we mean a Markov Chain on G with transition probability matrix P given by P(h, h_·s) =P(s) for all h, s_∈G. By a Cayley walk on G, we mean a Cayley walk on Ginduced by P for some probability distributionP overG.

In case of aCayley walk, one can look at the robust mixing time when the adversary’s strategies are limited to those preserving the symmetries of the group. We consider two natural restrictions.

(7)

Note that a Cayley adversary at time t, is only allowed to right multiply the current group elementg by a group elementschosen from the distributionPt onG. See Section 5.2 for more discussion on the power of a Cayley adversary.

Definition 1.16. LetG be a group. A permutation J on G is said to be a holomorphism if it can be written as the composition of

• right/left multiplication by elements ofG, and

• automorphisms ofG.

Definition 1.17. Let P be a Cayley walk on a groupG. A holomorphic strategy is a doubly stochastic matrix A which can be written as a convex combination of holomorphisms of G. Denote by _H the set of all holomorphic strategies of G (G will be clear from the context). A

holomorphic adversary is one who is limited to holomorphic strategies.

We now turn to defining the robust mixing time against restricted adversaries.

Definition 1.18. Let Pbe an irreducible Markov Chain. A set_S of stochastic matrices is said to bea valid set of strategies against Pif it satisfies the following:

• I _{∈ S},

• A_{∈ S} =_⇒ A is compatible withP,

• S is closed under products and convex combinations.

Definition 1.19. LetPbe an irreducible Markov Chain and_S a valid set of strategies against P. The_S-robust mixing time and _S-robust L2-mixing time are defined by the equations

RS(P, ǫ) = sup P T

(_P, ǫ) and RS₂(P, ǫ) = sup P T2

(_P, ǫ)

where_P = (P,_{A_t_}_t>₀) ranges over adversarially modified versions of PwhereA_t_{∈ S} for all t. In case Pis a Cayley walk on a group G, define theholomorphic robust mixing time and holo-morphic robust L2-mixing time by takingS =H. Similarly taking S =C define Cayley robust

mixing time and Cayley robust L2-mixing time.

Theorem 1.8 as well as Theorem 1.9 can be extended to work with any valid set of strategies againstP. Hence we also have the following

Theorem 1.20. (Submultiplicativity for Cayley walks) Let P be an ergodic Cayley walk on a groupG and Qany Cayley walk on G. For ǫ, δ >0 and R′ _{∈ {}_RC_{, R}H_}_{, we have}

R′(P, ǫδ/2) _≤R′(P, ǫ/2) +R′(P, δ/2) and R′₂(P, ǫδ)_≤R₂′(P, ǫ) +R2(P, δ)

Theorem 1.21. (Convexity for Cayley walks) Let Pbe an ergodic Cayley walk on a group

GandQany Cayley walk onG. Let0< a= 1₋b <1 andR′ _{∈ {}_RC_{, R}H_}_{. Then}_R′₍_PQ_,₁_/₄₎_≤ R′₍_P_,₁_/₄₎ _and _R′

2(PQ,1/2)≤R′2P,1/2). Also,

R′(aP+bQ,1/4)_≤R′(P,1/4) +R′(Q,1/4)₋1 R′₂(aP+bQ,1/2)_≤R′₂(P,1/2) +R′₂(Q,1/2)₋1

(8)

• if R′(P,1/4)_≥11, then R′(aP+bQ,1/4) _≤3R′(P,1/4)/a

• If _|G_{| ≥}16 and R′

2(P,1/2)≥log(|G|)/2 then R2′(aP+bQ)≤7R′(P)/a

Submultiplicativity and Convexity for Cayley walks are proved in Section 2.1 and Section 2.3 respectively.

Example 1. SupposeG is a finite group generated byS_⊆G. LetT _⊆Gbe arbitrary. LetP_S and P_S_∪_T denote the Cayley walks on G driven by the uniform distributions onS and S_∪T, respectively. Assume also that for R′ _{∈ {}_{R, R}H_{, R}C_}_{we have}_R′₍_P

S)≥11. Then writing

P_S_∪_T = |S|

|S_|+_|T_|PS+

|T_|

|S_|+_|T_|PT

allows us to apply Theorem 1.21 to infer

R′(P_S_∪_T,1/4) _≤3|S|+|T|

|S_| R

′₍_P

S,1/4) = 3

1 +|T|

|S_|

R′(P_S,1/4)

Thus we can remove some problematic generators while estimating the robust mixing time.

Definition 1.22. LetXbe a connected undirected graph andsa special vertex, called the sink.

• A configurationf consists of a non-negative integerf(u) associated with each non-sink vertexu of X. Vertex uis said to havef(u) grains.

• A configuration f is said to be stable ifff(u)< d(u) for all verticesu₆=s, whered(u) is the degree ofu.

• Supposef is an unstable configuration and u₆=sis such that f(u)_≥d(u). Bytoppling

f at u we mean removing d(u) grains from u and adding one grain to each non-sink neighborv ofu.

Note that the total number of grains in the system reduces exactly when a neighbor of the sink topples. (5) shows that by repeatedly toppling an unstable configuration one can reach a stable configuration (sinceX is connected). More over the final stable configuration is independent of the order in which vertices were toppled.

• Given two stable configurationsf andg, definef_◦gto be the stable configuration obtained after toppling the configuration withf(u) +g(u) grains at vertex u.

• A stable configurationf is said to be recurrent if for any configuration g there exists a configurationh for whichf =g_◦h.

• The set of allstable recurrent configurationsform an abelian group, called the Sand-pile group of X.

(9)

Example 2. The natural Markov Chain on the Sandpile groupG of a graph X with sinks, is the following: Given a stable recurrent configuration f, pick a vertex u ₆= s and move to the configurationf_◦eu, whereeu is the configuration with exactly one grain at u.

If we takeXn=Kn+2 (with one of the vertices being the sink), the Sandpile groupGn =⊕nCn is the product ofn-copies of the cyclic group Cn.

The n+ 1 generators of the natural Markov Chain on Gn are e1, . . . , en and f = −(e1 +e2+

· · ·+en), where ei is 1 at the i’th coordinate and zero elsewhere. Here takingS ={e1, . . . , en} and T = _{f_}, allows us to estimate the mixing time of this Markov Chain by eliminating the generator f.

Theorem 1.13 shows thatR(P) andR2(P) are determined up to a log(1/π∗) factor by the singular values of S(P). However, it turns out that RC₂ and RH₂ are within a factor of 2 of _T2(P←−P). In

fact we have,

Theorem 1.23. Let P denote an irreducible Cayley walk on a group G. Then

max(_T2(P),T2(P←−P))≤RC2(P)≤RH2 (P)≤2T2(P←P−)

In particular if P is a reversible ergodic Cayley walk on a group G, we have

T2(P)≤R2C(P)≤RH2(P)≤2T2(P2)≤ T2(P) + 1

Thus for a reversible ergodic Cayley walk, a holomorphic adversary cannot change theL2-mixing

time. Theorem 1.23 will be proved in Section 5.

1.5 Applications: Card Shuffling

In this section we give some applications of the foregoing results in this paper.

Definition 1.24. By the chainP_αβ_;_θwe mean the card shuffling process on Snwhere we choose two positionsiandjaccording to the rulesαandβrespectively and apply the operationθto the cards at positionsiandj. Possible values forαandβ areR(for random);A(for adversarial);C (for cyclic) i.e. t (modn);k(for some fixed value), T (for top, same as 1). A rule (γ/δ) implies that we choose according toγ andδ with prespecified probabilities (or equal if not specified). Possible values for the operationθareI (insertion) to move the card at position ito position j; T (transpose) to exchange the cards at positions iand j. Similarly an operation (γ/δ) implies apply the two operations with prespecified probabilities (or equal if not specified).

Note that the P_αβ_;_θ process is not necessarily Markovian as the rules and the operation may depend on the time t. We start with an example of simulating anon-Markovian process by an adversarially modified Markov Chain.

Proposition 1.25. Let P_RT_;_I denote the random-to-top move chain i. e., we pick a position

r uniformly at random and move the card in position r to the top. Also let P_RC_;_I denote the random-to-cyclic insertion process, where we move a random card to position t (modn). Then

(10)

Proof. Suppose k and r are arbitrary positions. Moving the card in position r to position k is equivalent to moving it to the top and then moving the top card to position k. Hence it follows that an adversarially modified P_RT_;_I chain can simulate a P_RC_;_I chain. Also note that the adversary involved in the simulation is a Cayley adversary.

The non-Markovian process we are mainly interested in this section is the random-to-cyclic transposition process P_RC_;_T. The problem of estimating the mixing time of P_RC_;_T was raised by Aldous and Diaconis (1) in 1986. Recently Mironov (16) used this shuffle to analyze a cryptographic system known as RC4 and showed thatP_RC_;_T mixes in time O(nlogn) without an estimate on the hidden constant. Mossel et al. (19) showed that _T(P_RC_;_T) = Θ(nlogn).

Theorem 1.26. Let P_RC_;_T denote the random-to-cyclic transposition chain, i.e. at time t we exchange the cards at positions r and t (modn) where r _{∈ {}1, . . . , n_} is chosen uniformly at random. Then _T2(PRC;T)≤ T2(PRT;T) + 1≤nlogn+O(n)

Proof. In fact, we prove the following chain of inequalities: _T2(PRC;T) ≤ T2(PRA;T) ≤

RC

2(PRT;T)≤RH2 (PRT;T)≤ T2(PRT;T) + 1≤nlogn+O(n).

For a particular choice of adversarial moves the P_RA_;_T process can simulate theP_RC_;_T process. Hence_T2(PRC;T)≤ T2(PRA;T).

By convexity arguments, it is enough to consider the case that the adversary’s choice is de-terministic to estimate the mixing time of P_RA_;_T. Let αt ∈ {1, . . . , n} denote an adversarial choice for timet (fixed before the process begins). We first observe that an adversarial version of P_RT_;_T can simulate P_RA_;_T. For k, r _{∈ {}1, . . . , n_},(k r) = (1k)(1r)(1k). Hence if we let A_t correspond to right multiplication by (1αt)(1αt+1), it follows that the given adversarial

modifi-cation ofP_RT_;_T simulatesP_RA_;_T. Since the simulation was done by a Cayley adversary, we have

T2(PRA;T)≤RC2(PRT;T)≤RH2 (PRT;T). From Theorem 1.23 it follows that RH

2 (PRT;T) ≤ T2(PRT;T) + 1 since PRT;T is reversible. But

T2(PRT;T)≤nlogn+O(n) ((7; 20)).

Another application is in estimating the mixing time of a mixture of two reversible Cayley walks on a group G.

Theorem 1.27. Let P₁ and P₂ be two reversible ergodic Cayley walks on a group G and put

(11)

Note that using standard results, one can only show _T2(Q) ≤ min(T2(P),T2(Q))O(log(|G|)).

Thus we have eliminated the pesky log_|G_| factor, which can be significant since usually_|G_| is exponential.

We finish this section with a few examples of estimating the mixing time of complex card shuffling chains in terms of that of simpler ones. Given card positionsiandj, moving the card at position j to position icorresponds to right multiplying by the cycle Cij = (i, i+ 1, . . . , j) if i < j and

Cij =Cji−1 ifi > j. Ifi=j,Cij is the identity permutation.

Example 3. “Move a or b chain” Fix 1 _≤ a, b _≤ n and 0 _≤ q = 1₋p _≤ 1. Let R′ _∈

{R, R2, RH, R2H, RC, RC2}. Consider the following random-to-a-or-b move chain given by

P_R₍_a/b_);_I =pP_Ra_;_I+qP_Rb_;_I

i.e. we either choose a random card and move it to either position a with probability p or positionb with the remaining probability.

Observe that R′₍_P

a) = R′(Pb) because an adversary for one can simulate that for the other. Hence R′₍_P

a) = R′(P1) = R′(PRT;I). Hence Theorem 1.9 implies R′(PR(a/b);I) is bounded by 2R′₍_P

RT;I)−1. AlsoR′(PRT;I)≤RH2(PRT;I) andPRT;IP←−−−RT;I=PRR;I. Hence by Theorem 1.23, we haveRH₂ (P_R₍_a/b_);_I)_≤4_T2(PRR;I). Corollary A.4 showsT2(PRR;I)≤1.5nlogn+O(n), giving a bound of 6nlogn+O(n) for RH₂ (P_R₍_a/b_);_I).

Example 4. “Transpose or Move chain”Let P_RR_;(_{T /M}₎ be the transpose or move chain, i.e. we pick two positionsi andj at random. We transpose the selected cards with probability 1/2 and move card at position i to position j with probability 1/2. Note that _T2(PRR;T) ≤ 0.5nlogn+O(n) (10),_T2(PRR;I)≤1.5nlogn+O(n) (Corollary A.4), and that both are reversible chains. Hence we have have_T2(PRR;(T /M))≤2nlogn+O(n).

1.6 Applications: Reversible lifting

Definition 1.28. Let P and Q be Markov Chains on state spaces _X and _Y with stationary distributions π and µ respectively. P is said to be a collapsing of Q if there exists a mapping

f :_{Y → X} such that the following hold:

• π(x) =µ(_Yx) for allx∈ X whereYx =f−1(x)

• For allx1, x2 ∈ X,

P(x1, x2) =

X

y1∈Yx₁

X

y2∈Yx₂

µx1(y1)Q(y1, y2) (5)

whereµx is the conditional distribution of y_{∈ Y} given f(y) =x, i.e. µx(y) =µ(y)/π(x).

A lifting ofP is a chainQsuch that Pis the collapsing of Q.

Chen et al. (4) showed that if Q is a reversible lifting of a Markov chain P, then _T(Q) _≥

(12)

 

1₋υ υ 0. . .0 1 0 0. . .0 0 0 I



 (6)

The states are indexed starting with y = argmax_xπ(x) and the current state z ₆= y. Here υ=π(z)/π(y).

Figure 1: An adaptive adversary is unreasonably powerful

Theorem 1.29. Let Q be a lifting ofP. Then R(Q)_{≥ T}(P).

If Q is reversible, Theorem 1.13 implies that R(Q) _{≤ T}(Q)(1 + log(1/µ_∗)/2), where µ_∗ = minyµ(y). This immediately gives

Corollary 1.30. Let Q be a reversible Markov Chain with stationary distribution µ and P a collapsing ofQ with stationary distributionπ. Then _T(Q) log(1/µ_∗)_{≥ T}(P).

Whenµ_∗ is only polynomially smaller thanπ_∗, we have an alternate proof of the reversible lifting result. In order to fine tune the result, we look at the adversarial strategy used in the proof of Theorem 1.29 more closely and prove

Theorem 1.31. Let Q be a reversible lifting ofP. Then _T(Q)_≥r(P)

This not only gives an alternate proof of the reversible lifting result of (4), it also shows that when _T(P) = O(r(P)) no reversible lifting Q of P can mix faster than P (ignoring constant factors).

1.7 Discussion: The power of the adversary

We discuss the necessity of our restrictions on the adversary.

Our definition of robust mixing time, requires that the adversary beoblivious, i. e., announce all his moves in advance. Anadaptive adversary would be unreasonably powerful as shown below. Let y be the state with maximum stationary probability and suppose we allow an adaptive adversary. The adversary can ensure that the chain is always at state y at the end of its turn as follows: Suppose the current state of the chain inz. If z=y the adversary does not do any anything. Otherwise the adversary applies the stochastic matrix given in Figure 1. It is easily checked that this matrix is compatible with P and sends z to y with probability 1. For this reason we do not consider an adaptive adversary.

The requirement that the adversary’s choices are compatible with P ensures that an AMMC process has a hope to have a limit.

2 Basic Properties

2.1 Submultiplicativity

(13)

standard mixing time as well. The second proof is new as far as we know, and simplifies the proof for standard mixing time as well. Apart from these two proofs, the argument based on coupling can also be extended to handle the robust case.

Theorem 2.1. Let Pbe an ergodic Markov Chain andǫ, δ >0. Let_S be a valid set of strategies against P. Then RS₍_ǫδ/₂₎_≤_RS₍_ǫ/_{2) +}_RS₍_δ/₂₎_.

Proof. This proof is based on the standard proof using the triangle inequality. LetS=RS(ǫ/2) and T =RS₍_δ/_{2). Let} _A

s,t denote a sequence of t−s+ 1 stochastic matrices inS.

By convexity it is enough to consider initial distribution concentrated on a single point. For x_{∈ X}, let δx denote the distribution concentrated onx. Define the following quantities:

∆(x, t,A₁_,t) =_||µt−π||T V

where µt is the t-step distribution using the adversarial strategy A1,t and initial distribution

(14)

Before we get to the new proof we need the following

Lemma 2.2. Let Q be a stochastic matrix for which πQ = π. Suppose we know that for all initial distributions µ, _||µQ₋π_||T V < ǫ. Then for all initial distribution µ,

||µQ₋π_||T V ≤ǫmin(1,2||µ−π||T V) (7)

Proof. Let η = _||µ₋π_||T V. It follows that we can write µ−π = η(ν1 −ν2) for appropriate

distributionsν1, ν2. Then

||µQ₋π_||T V =||(µ−π)Q||T V ≤η||ν1Q−π||T V +η||ν2Q−π||T V ≤2ηǫ= 2ǫ||µ−π||T V

The same proof using_{|| · ||}2,π instead of|| · ||T V gives

Corollary 2.3. Let Qbe a stochastic matrix for which πQ=π. Suppose we know that for all initial distributions µ, _||µQ₋π_||2,π< ǫ. Then for all initial distributions µ,

||µQ₋π_||2,π≤ǫmin(1,2||µ−π||T V)≤ǫmin(1,||µ−π||2,π) (8)

Alternate proof of Theorem 2.1. Let S = RS(ǫ/2) and T = RS(δ/2). Let A₁, . . . ,A_S,A_S₊₁, . . . ,A_S₊_T be a sequence of stochastic matrices from _S. Let C=PA₁PA₂. . .PA_S and D=PA_S₊₁PA_S₊₂. . .PA_S₊_T.

From the lemma above, we have

||µCD₋π_||T V ≤(δ/2)2||µC−π||T V ≤δǫ/2 (9)

Since A_i were arbitrary we have the result.

Theorem 2.4. Let P be an ergodic Markov Chain and ǫ >0. Let _S be a valid set of stochastic matrices compatible withP. Then the following are sub-multiplicative: R₂S(ǫ), RS(ǫ/2). Moreover for k >1, we also have RS

2(ǫk)≤RS2(ǫ) + (k−1)RS(ǫ/2).

Proof. Theorem 2.1 shows that RS(ǫ/2) is sub multiplicative. Replacing application of Lemma 2.2 with Corollary 2.3, in the proof of Theorem 2.1 shows that RS

2(ǫ) is sub

multi-plicative.

For the last part: LetT1=RS(ǫ/2) andT2=RS2(ǫ). Then for any initial distributionµ0, we have

||µ₍_k₋₁₎_T1 ₋π_||T V ≤ǫk−1/2 (by submultiplicativity for total variation distance). Now using the tighter inequality in Corollary 2.3, we have_||µ(k−1)T1+T2−π||2,π≤2ǫ||µ(k−1)T −π||T V ≤ǫk.

The second part can be useful in obtaining better non-asymptotic bounds forL2 mixing when

(15)

2.2 Finiteness

Now we characterize chains with finite robust mixing time. First we observe that ergodicity is not enough to guarantee finiteness ofR as shown by the following examples.

Example 5. “walk on directed edges”: Let X be a connectedd-regular undirected graph. Assume that the usual random walk onX is ergodic. Consider the same walk except that this time we also keep track of the previous vertex. Thus the states of this walk are directed edges (u, v) ofX. If at timetwe are at (u, v) we move to (v, w) wherewis a uniformly chosen neighbor of v.

Since the transition rule doesn’t care which vertex we came from, it follows that the new walk is ergodic exactly when the usual random walk on X is. Also the mixing time of the new walk is bounded by one more than that of the usual walk. This is because picking a random neighbor of a random vertex of a regular graph is equivalent to picking a random directed edge.

Consider the following adversarial strategyA: A((u, v)) = (v, u). The adversary simply reverses the direction of the edge. Letv be any vertex ofX and letµ0 be the uniform distribution on all

edges coming into v. Then ν1 is the uniform distribution of all edges going out ofv. Applying

the adversary’s strategy we get µ1=µ0. ThusR(P) =∞.

Example 6. “Bottom k to top shuffles”: LetP_k denote the following Markov Chain onSn. Given a pack ofn cards, we pick a random card among the bottom k cards (where 1_≤k_≤n) and move it to the top. Unlessk=n,R(P_k) =_∞. The adversarial strategy of exchanging the top two cards, ensures that the top card of the deck stays fixed (if k < n).

(12) defined “Bottomkto top shuffles” and showed that it mixes in time betweenO(nlogn) and O(n3logn) ask varies from ndown to 2. Note thatk=nis the Random to Top transposition shuffle andk= 2 is related to the Rudvalis shuffle.

Theorem 2.5. Let Pbe an ergodic Markov Chain with stationary distribution π.

• R(ǫ) is finite for all ǫ >0 iff σ1(S(P))<1.

• if P has all holding probabilities positive, thenσ1(S(P))<1.

• If σ1(S(P))<1, then 2||µt−π||T V ≤σ1(S(P))t

q

1

π∗

Proof. It is easy to see that ←−P is also a Markov Chain with same stationary distribution as P. Hence if we put A_t=←−P, we see that R(P, ǫ)_{≥ T}(P←P−, ǫ).

Suppose σ1(S(P)) = 1. Observe that the singular values of S(P) are just the eigenvalues of

S(P)S(P)T =S(P←−P). ButS(P←P−) andP←P−are similar matrices. Hence it follows thatP←P−is not ergodic and henceR(P)_{≥ T}(P←−P) =_∞.

On the other hand ifσ1(S(P))<1, it follows that

||µPA₋π_||2,π =||(µ−π)PA||2,π≤ ||(µ−π)P||2,π ≤σ1||µ−π||2,π (10)

(16)

p

1/π_∗ for any distribution µ. Finally using the fact that 2_||µ₋π_||T V ≤ ||µ−π||2,π, we have 2_||µt−π||T V ≤σ1(S(P))t

p

1/π_∗.

If Phas all holding probabilities positive, write P=aI+ (1₋a)Qfor some a >0 and observe that

||xS(P)_||2 ≤a||x||2+ (1−a)||xS(Q)||2 ≤ ||x||2 (11)

Thus_||xS(P)_||2=||x||2 =⇒ xS(P) =xand hence ergodicity ofPimplies

√

Πxis a multiple of π. Henceσ1(S(P))<1.

In caseP has uniform distribution (henceS(P) =P) and σ1(P) = 1, we can easily construct an

adversarial strategy as follows. Letαdenote the left singular vector corresponding toσ1(P) and β=αP. Since Pis a convex combination of permutation matrices and_||β_||2 =||α||2 it follows

that β is a permutation of α. Let I = _{x _{∈ X}:α(x) _≥ α(y)_∀y _{∈ X }}, i. e., states x where

α(x) attains its maximum. Similarly let J _{⊆ X} be the set of states x where β(x) attains its maximum. Sinceβ is a permutation of α it follows thatPmaps the uniform distribution on I to the uniform distribution ofJ (consider the initial distributionπ+ǫα for a small enough ǫ). Since α is not the constant vector, I and J are non-trivial subsets of _X. Hence the adversary can choose a permutation on _X which maps J to I and the initial distribution is taken to be uniform onI.

We now prove Theorem 1.13.

Proof of Theorem 1.13. Let σ = σ1(S(P)). The case σ = 1 is easy, so we assume σ < 1. By

considering the constant adversarial strategies I and ←−P, we have max(_T(P←P−),_T(P)) _≤R(P). Consider the reversible Markov chain Q = P←−P with second largest eigenvalue σ2. Since Q is reversible, standard results implyr(Q)_{≤ T}(Q). By definition of relaxation time, we have

σ2r(Q)= (σ2)r(Q)_≤1/e

Takingt= 2kr(Q) in Theorem 2.5, gives

2_||µ2kr(Q)−π||T V ≤ e−k

√_π

∗

Putk= log(1/π_∗)/2 + 1 to ensure 2_||µ_kr₍P)−π||T V ≤1/e<1/2. This gives the first half of the result.

IfPis reversible thenS(P) is symmetric and henceσ1(S(P)) =λ∗(S(P)) =λ∗(P). ClearlyR(P)≥

T(P)_≥r(P). By definition ofr(P), we haveλ_∗r(P)_≤1/e. Now substitutet=r(P)(log(1/π_∗)/2 + 1) in Theorem 2.5 to conclude thatR(P)_≤r(P)(log(1/π_∗)/2 + 1).

The tightness of this inequality is discussed at the end of Section 3.

2.3 Convexity

(17)

Lemma 2.6. Let Pbe an irreducible Markov Chain and _S a valid set of strategies againstP. If

Q_{∈ S}, thenR′₍_PQ₎_≤_R′₍_P₎_for_R′_{∈ {}_RS_{, R}S

2}. In particular, ifPis an irreducible Cayley walk

on a groupGand Qany random walk onG(not necessarily irreducible). Then R′₍_PQ₎_≤_R′₍_P₎

for R′_{∈ {}RC, RH, RC₂, R₂H_}.

Proof. Let_P = (PQ,_{A_t_}_t>₀) be any adversarially modified version of PQwhereA_t_{∈ S}. Then

P′ _{= (}_P_,_{_QA

t}t>0) is an adversarially modified version ofP whereQAt∈ S sinceQ∈ S and S is closed under products. Moreover the mixing times of_P and _P′ _{are equal. Taking supremum} over_P we have the result.

For the case of Cayley walks, we just observe that the transition matrices of all Cayley walks belong to_{C ⊆ H}.

We now show that the robust mixing time of a convex combination of Markov Chains can be bounded in terms of that of the participating chains.

LetPandQbe two irreducible Markov Chains with same stationary distribution π. Suppose_S is valid set of strategies againstPand Q. Also assumeP_{∈ S} and Q_{∈ S}. Fix 0< a= 1₋b <1 and consider the chain aP+bQ. Let _P = (aP+bQ,_{A_t_}_t>₀) be any adversarially modified version of aP+bQ. Fix S > 0 and ǫ = (ǫ1, . . . , ǫS) where ǫi ∈ {0,1}. Define the following quantities:

• P(0)=Q,P(1) =P

• ξ(ǫ) =QS

i=1P(ǫi)Ai

• H(ǫ) =PS

i=1ǫi

• w(ǫ) =QS

i=1aǫib1−ǫi =aH(ǫ)bS−H(ǫ)

Ifµ0 is any initial distribution, andµS is the distribution afterS rounds, we have

µS−π=

X

ǫ

w(ǫ) (µ0ξ(ǫ)−π)

where the sum ranges over all 2S choices forǫ= (ǫ1, . . . , ǫS).

Lemma 2.7. Let P andQbe ergodic Markov Chains with the same stationary distribution. Let S be a valid set of strategies against both P and Q and assume that P _{∈ S} and Q _{∈ S}. Let

0< a= 1₋b <1. Then forR′ _{∈ {}RS, RS₂_}, R′(aP+bQ)_≤R′(P) +R′(Q)₋1. In particular, if

Pand Qare ergodic Cayley walks on a group G, then we haveR′₍_a_P₊_b_Q₎_≤_R′₍_P_{) +}_R′₍_Q₎₋₁

for R′_{∈ {}_RH_{, R}C_{, R}H

2, RC2}.

Proof. Choose S=RS(P) +RS(Q)₋1. Then we have

||µS−π||T V ≤

X

ǫ

w(ǫ)_||µ0ξ(ǫ)−π||T V

Now for eachǫ,ξ(ǫ) either contains_≥RS(P) occurrences ofPor contains_≥RS(Q) occurrences of Q. The remaining matrices can be considered as an adversarial choice. Hence we have

(18)

Similarly, takingS=RS₂(P) +RS₂(Q)₋1, and looking at the_||µS−π||2,π we getR2S(aP+bQ)≤

RS

2(P) +RS2(Q)−1.

In case P and Q are Cayley walks onG, we just observe that P and Q are valid choices for a Cayley adversary and hence also for a holomorphic adversary.

Now we consider the case when Phas finite robust mixing time and Qmay not. We start with a concentration inequality.

Lemma 2.8. Let S =CT /pfor C >1 and0< p <1. LetZ1, . . . , ZS be independent Bernoulli

random variables with Pr_{Zi= 1}=p. Let Z=PiZi. Then we have

log Pr_{Z < T_{} ≤ −}T((C₋1)₋logC) (12)

Proof. We use Hoeffding’s inequality (13, Theorem 1), for S₋Z to conclude

Pr_{Z < T_{} ≤}

Lemma 2.9. Let P be an ergodic Markov Chain and _S a valid set of strategies against P. Fix

(19)

Corollary 2.10. Let P be an ergodic Markov Chain and Q be compatible with P. Let _S be a valid set of strategies against P. Assume Q_{∈ S}. Let 0< a = 1₋b <1. Then RS₍_a_P₊_b_Q₎_≤ 2(1 +δ)RS₍_P₎_/a_{, as long as} ₂_RS₍_P₎₍_δ₋_{log(1 +}_δ₎₎_≥_{log 8}_{. If}_RS₍_P₎_≥₁₁_{, then}_δ _{may be taken}

to be 1/2.

In particular, ifP is an ergodic Cayley walk on the groupGand Qis any Cayley walk onG, the same conclusion holds with RS _{replaced by} _RH _and_RC_.

Proof. LetT =RS(P) andS= 2T(1 +δ)/a. By submultiplicativity, we haveRS(P, γ)_≤2T for γ = 1/8. Lemma 2.9 now gives

||µS−π||T V ≤1/8 + exp (−2T(δ−log(1 +δ)))

If 2T(δ₋log(1 +δ))_≥log 8, we have_||µS−π||T V ≤1/8 + 1/8 = 1/4 as required.

Similarly for the _S-robustL2-mixing time we get,

Lemma 2.11. Let P be an ergodic Markov Chain and _S a valid set of strategies against P. Fix

Q_{∈ S}. Let0< a= 1₋b <1. LetS = (1 +δ)RS₂(P, γ)/a, where γ >0 and δ >0 are arbitrary.

Proof. This proof is similar to that of Lemma 2.9 except for a slightly different choice of pa-rameters. Put T = RS₂(P, γ) and S = T(1 +δ)/a. Put _D = _{ǫ : H(ǫ) _≥ T_}. If ǫ _{∈ D},

Corollary 2.12. Let Pbe an ergodic Markov Chain and_S a valid set of strategies againstP. Let

Qbe compatible withPandQ_{∈ S}. Let0< a= 1₋b <1. Assume thatRS

(20)

3 Reversible liftings

In this section we reprove a result due to Chen et al. (4) on reversible liftings of Markov Chains. The proof is inspired by considering the Robust mixing time of a Markov Chain and looking at a particular adversarial strategy. We start with a proof of Theorem 1.29.

Proof of Theorem 1.29. We prove R(Q) _{≥ T}(P) by exhibiting an adversarial strategy which allows the adversary to simulate the evolution of P.

Consider the following adversarial strategyA: Given y _{∈ Y}, the adversary picks a statey′ _{∈ Y} according to the distribution µx _where _x₌_f₍_y_{). Recall that} _µx _{is the conditional distribution} of µgiven thatf(y) =x.

Since µ = P

x∈Xπ(x)µx, it follows that this strategy fixes the stationary distribution µ of Q. We now claim that with this strategy the adversary can simulate the evolution ofP on _Y. For a distributionν on_X, consider the distributionF(ν) =P ensures that this adversarially modifiedQcannot mix faster thanP. HenceR(Q)_{≥ T}(P).

Now ifQwere reversible, Theorem 1.13 implies R(Q)_{≤ T}(Q)(1 + log(1/µ_∗)/2). Hence

T(Q)_≥ T(P) log(1/µ_∗)

(21)

Proof of Theorem 1.31. LetAdenote the stochastic matrix representing the adversarial strategy. Note thatAis reducible, but on each irreducible component, it reaches stationarity in one step. It follows that Ais reversible.

Letα denote the eigenvector (of length_{|X |}) corresponding toλ_∗(P) and defineβ(of length_|Y|) via β(y) =α(x)µx₍_y_{) where} _x₌_F₍_y_{). From our analysis before, it follows that for all} _x_{∈ X}_,

X

y∈Yx

(βQ)(y) =λ_∗(P)α(x)

SinceA redistributes the probability in each_Yx according to µx, we have for y∈ Y,

β(QA)(y) =λ_∗(P)α(x)µx(y) =λ_∗(P)β(y)

wherex=F(y). Thusλ_∗(QA)_≥λ_∗(P). SinceAis a contraction (it is stochastic), we have

|λ_∗(Q)_{| ≥ |}λ_∗(QA)_{| ≥ |}λ_∗(P)_|

Hence_T(Q)_≥r(Q)_≥r(P).

As a consequence, it follows that for many natural reversible chains where_T(P) =O(r(P)), one cannot gain more than a constant factor improvement by considering a reversible lifting. (4) also gives an exampleQof a reversible random walk on a tree (withπ_∗ exponentially small) and its collapsingPfor which

T(Q) = Θ

T(P)log log(1/π∗) log(1/π_∗)

Since we know thatR(Q)_{≥ T}(P) it shows that Theorem 1.13 is almost tight, even for reversible chains.

4 Upper bounds on Robust mixing time

In this section we observe that many proof techniques which establish bounds on mixing time in fact give us bounds on the Robust mixing time.

Definition 4.1. Let P be the transition matrix of an ergodic Markov Chain with stationary distribution π. For vectors α,β, put _hα,β_iπ = Px∈X α(x)β(x)π(x). Associate the following bilinear forms withP

EP(α,β) =h(I−P)α,βiπ

FP(α,β) =h(I+P)α,βiπ

EP is the Dirichlet form associated with P and FP is used to bound the negative eigenvalues of

(22)

Definition 4.2. Let π be a distributions on_X and π(x) >0 for all x _{∈ X}. For any function f :_{X →}R+, define the entropy of f via

Ent(f) =E_π[flogf] (13)

and for a distributionµ on _X, put Ent(µ) =D(µ_||π) = Ent(f), wheref(x) =µ(x)/π(x) is the density function ofµw.r.t. π.

Ent(µ) measures the entropy ofµrelative toπ. It is easy to see that Ent(_·) is convex non-negative and vanishes exactly whenµ=π. The log-Sobolev constant is defined as follows

Definition 4.3. LetP be a Markov Chain satisfyingπP=π. The log-Sobolev constantα(P) is defined as

α(P) = min f

EP(f, f)

Ent(f2₎ (14)

where the minimum is taken over all real valued functionsf on _X for which_hf, f_iπ = 1.

Lemma 4.4. Let P be an ergodic Markov Chain with stationary distribution π. Let µ be any distribution and A any stochastic matrix compatible withP. Then for any p_≥1,

(a) _||µA₋π_||p,π ≤ ||µ−π||p,π

(b) V arπ(µA)≤V arπ(µ)

(c) _||µA₋π_||T V ≤ ||µ−π||T V

(d) D(µA_||π)_≤D(µ_||π)

Proof. Since Ais stochastic and compatible with P, it follows that Ais a contraction on Lp(π) forp_≥1. This proves (a). (b) and (c) are special cases of (a).

Ais a contraction on L2(π) impliesσ1(S(A))≤1. Hence the log-Sobolev constant α(A←A−)≥0.

(15, Prop 6) shows thatD(µA_||π)_≤D(µ_||π), whenα(A←A−)_≥0.

As observed in (18, Chapter 3), many upper bounds on mixing time can be described via the following approach:

• LetV(η) be a scalar associated with the distribution η which satisfies the following

– V(η)_≥0 for all distributionsη,

– V(ηt)→0 ⇐⇒ ηt→π, for any sequence of distributionsηt and – V(ηP)_≤V(η).

• Let I(t) = V(µt) where µt is the t-step distribution. Note that I(t) is a non-increasing function,

(23)

• UsingV(0) = I(0), solve for T by which V(T) _≤1/4 (or some appropriate constant <1 depending on choice ofV)

In case of an AMMC _P = (P,_{A_t_}) we can do the following

• V will usually also satisfy V(µA)_≤V(µ) for all Acompatible with P.

• LetI(t) =V(µt) andJ(t) =V(νt) where νt=µtPand µt+1=νtAt.

• I(t)₋J(t)_≥G(I(t)) andI(t+ 1)_≤J(t) imply I(t)₋I(t+ 1)_≥G(I(t)) as before.

Hence bounds on mixing time given by methods following the above approach also apply to Robust mixing time. This includes most analytical methods used to establish upper bounds on mixing time including log-Sobolev inequality based bounds, conductance based bounds and congestion based bounds (17). Bounds using Entropy constant (a.k.a Modified log-Sobolev constant) do not lead to a Robust mixing upper bound as they are not known to work in discrete time.

Unlike standard mixing time, establishing a bound onR(P2) does not automatically imply any bound onR(P).

Example 7. Let P be the walk on directed edges on a “good enough” undirected regular expander X. We have already seen that R(P) = _∞ and _T(P) < _∞. One can also show that R(P2)<_∞, thus showing that if the adversary is only allowed to intervene every alternate step, any adversarially modified version ofPwill still converge. A heuristic argument for that is that since the chain P remembers only the previous step of the usual random walk on X, when we run two steps of the chain in succession, the adversary does not gain any extra knowledge. A formal proof of R(P2)<_∞ is given in Theorem 4.11.

However, techniques similar to those used in Theorem 1.9 can be used to show

Lemma 4.5. Let P,Q be compatible Markov Chains and k > 1. Then for 0 < a= 1₋b < 1, T(aP+bQ)_≤C_·kR(Pk)/ak for some absolute constant C.

Proof. Consider t= CkR(Pk)/ak for C to be chosen later. Condition the run on the result of the initial coin toss (to decide if we runPorQ). Break run ofaP+bQintot/kblocks of length k. The expected number of allPblocks isCR(Pk). For large enough C we have at least 2R(Pk) all P blocks. This together with an application of Chernoff type concentration inequality gives the result.

Coupling is one common approach to estimate mixing time of many Markov Chains. A coupling proof does not automatically give bounds on the robust mixing time. Many coupling proofs start by defining some notion of distance and show that it decreases in expectation after each step. Such coupling proofs lead to a bound on robust mixing time if one restricts to adversaries who cannot increase thedistance.

(24)

Moves of a Cayley adversary in this case, amounts to flipping a subset of coordinates. Flipping the same subset in the other copy of the chain, ensures that the Hamming distance is not changed. Thus we have a coupling based proof of thenlogn+O(n) mixing time bound for the lazy random walk against a Cayley adversary.

This also works against a holomorphic adversary, as automorphisms of the hypercube correspond to a combination of permutation of the coordinates and flipping a subset of the coordinates.

Example 9. Consider the random to top move chain, where we pick a random card and move it to the top position. The coupling proof starts with two copies of the chain and couples them so that after the move the two copies have the same top card. Thus we have created a pairing of cards in the two copies. The proof that we have not destroyed any existing pairing, depends on the fact that the paired cards are always continguous.

A Cayley adversary can easily destroy the continguous nature of the paired cards and thus in the next step, one may actually see a decrease in the number of paired cards.

Thus the na¨ıve extension of the coupling based proof does not work in the robust setting.

The log-Sobolev constant as originally introduced by Diaconis works only in continuous time and needs to be adapted for the discrete time case. Miclo (15) adapted the result to discrete time and showed that Ent(µ)₋Ent(µP)_≥α(P←−P) Ent(µ), whereα(P←P−) is the log-Sobolev constant of the reversible chain P←−P. This immediately translates to Robust mixing time bound as well.

Lemma 4.6. Let Pbe an ergodic Markov Chain with stationary distributionπ. Letα=α(P←−P)

denote the log-Sobolev constant for the chain P←−P. Then R(P) =O(log log(1/π_∗)/α).

Proof. (15, Proposition 6) shows that after each application of P, the entropy of the resulting distribution falls by a factor 1₋α(P←−P). Clearly adversarial moves cannot increase the entropy. Since the initial entropy is_≤log(1/π_∗), the result follows.

Lemma 4.7. Let P be a reversible Markov Chain. Then

α

I +P 2

2!

≥α(₂P) (15)

If P has non-negative eigenvalues, then α(P2)_≥α(P).

Proof. Suppose P has non-negative eigenvalues. Let 1 = λ0 > λ1 ≥ · · · ≥ λN−1 ≥ 0 be the

eigenvalues ofP and letxi be the corresponding eigenvectors which form an orthonormal basis of L2(π). Write f =P

icixi for appropriate scalarsci. Then

EP2(f, f) =h(I −P2)f, fi_π =

X

i

c2_i(1₋λ2_i)_≥X i

c2_i(1₋λi) =h(I −P)f, fiπ =EP(f, f) (16)

(25)

Example 10. Lazy Random walk on hypercubeLetPdenote the (reversible) lazy random walk on the n-dimensional hypercube. All the eigenvalues of P are of the form k/n for k _∈

{0, . . . , n_}. (6) shows that the log-Sobolev constant for the hypercube is Θ(1/n), and hence we have R(P) =O(nlog log(2n)) =O(nlogn) = Θ(_T(P)).

Theorem 4.8. LetP_RT_;_T denote the Random-Top transposition chain. ThenR((I+P_RT_;_T)/2) = O(nlog2n).

Proof. We start by estimating the log-Sobolev constant of P_RT_;_T. Lemma A.2 shows that

EPRR;T ≤ 18EPRT;T. (2) shows that α(PRR;T) ≥ 1

2nlogn. Hence by the definition of the log-Sobolev constant, it follows that α(P_RT_;_T)_≥ ₃₆_n1_log_n. Lemma 4.7 now implies

α

I+P_RT_;_T 2

2!

≥ 1

72nlogn (17)

The result now follows from Lemma 4.6.

If we try to estimate the robust mixing time of P_RT_;_T via log-Sobolev methods directly via Lemma 4.6, one would need to estimate the log-Sobolev constant of P2_RT_;_T. The bound in Theorem 4.8 is against an adversary who is not required to respect the group structure. Later we show the optimalO(nlogn) bound if we require the adversary to respect the group structure.

4.1 A spectral result

In this subsection, we prove a new rapid mixing result based on spectral machinery, which can be applied to chains whose top singular values are equal to or very close to 1. We use this to show that R(P2)<_∞ where P is the walk on directed edges considered before. For simplicity, we assume thatP has uniform stationary distribution through out the rest of this section. The general case follows by considering_S(P) instead of P.

The approach of this spectral result is the following: Call a subspace M“bad” if any vector in M does not contract too much whenPis applied to it. Similarly call a subspace “good” if every vector inM contracts a lot whenP is applied to it.

In case of a reversible Markov chain P, the “bad” subspace is usually invariant under P (e.g. eigenvector corresponding to eigenvalueλ_∗). However, ifP is non-reversible,P may map “bad” subspaces to a “good” subspace. In such a case, two applications of Pshould contract a vector. Let P be a Markov chain on N states with uniform stationary distribution π. Consider the Singular Value Decomposition ofP and let x0, . . . ,xN−1 and y0, . . . ,yN−1 denote the left and

right singular vectors ofP. Since π is uniform we may assume x0 =y0 = constant vector. Let

X(P) (respy. Y(P)) denote the matrix whose columns arex_i (respy. y_i) and Σ(P) the diagonal matrix consisting of the singular valuesσi ofP so that the following equation holds

P=X(P)Σ(P)Y(P)T

(26)

Definition 4.9. LetPbe a Markov chain with uniform stationary distribution andX(P), Y(P) be as above. Define the inner product matrixIP(P) via

IP(P) = Σ(P)Y(P)TX(P)

Note that IP(P)ij = hxiP,xji and that IP(P) is just P written in the xi basis. For k = 1, . . . , N₋1, denote by

• Ek, the linear span of x1, . . . ,xk,

• E′

k, the linear span ofxk+1, . . . ,xN−1,

• Lk, the orthogonal projection (of theN dimensional vector space) ontoEk,

• ck, the euclidean norm of the Ek→Ek operator

z_7→zPLk

Note that ck is the norm of (k−1)×(k−1)-minor of IP(P) consisting of rows and columns 2, . . . , k.

Note that in terms of our earlier discussion the subspace Ek is “bad” as it corresponds to high singular values and the subspaceE_k′ is “good” as it corresponds to vectors of low singular values.

Theorem 4.10. Let P be a doubly stochastic matrix with SVD (xi,yi, σi). Then for any initial

distributionµ and t >0 and1_≤k_≤N, we have

||µP2t₋π_||2 ≤(σ1(ck+ 2σk))t||µ−π||2

where π is the uniform stationary distribution. When k=N, we takecN =σ1 andσN = 0.

Proof. Let (xi,yi, σi)_iN₌₀−1be the Singular Value Decomposition ofP. LetE =Ekbe the subspace spanned by x1, . . . ,xk−1 and E′ =Ek′ be the subspace spanned byxk, . . . ,xN−1. Also, denote

by L, the orthogonal projection ontoE and byL′ _{the orthogonal projection onto}_E′_.

Since xiP=σiyi and we see that Pmaps E and E′ to orthogonal subspaces. By definitionck is the Euclidean norm of the operatorLPLon E_⊕E′_{. We now show that the Euclidean norm} of L′Pon E_⊕E′ isσk. Also note that the Euclidean norm of the operator Pon E⊕E′ isσ1.

Letz′_∈_E′ _{be arbitrary. Then}

z′₌X

i≥k

αixi

for some scalars αi, sinceE′ is spanned byxk, . . . ,xN−1. SincexiP=σiyi, we have

||z′_P_||2 2 =

X

i≥k

α2_i_||σiyi||2 ≤σk2

X

i≥k

α2_i =σ2_k_||z′_||2

2 (18)

(27)

Letz be any vector orthogonal toπ. Then

zP2 =zLP2+zL′P2

=zLPLP+zLPL′P+zL′P2 (19)

Estimating the individual norms, we get

||zLPLP_||₂ _≤σ1||zLPL||2 ≤σ1ck||z||2 (20)

||zLPL′P_||₂ _≤σk||zLP||2 ≤σkσ1||z||2 (21)

||zL′P2_||₂ _≤σ1||zL′P||2 ≤σ1σk||z||2 (22)

Combining we get

||zP2_||₂ _≤σ1(ck+ 2σk)z

Ifµ is any initial distribution,µ₋π is orthogonal toπ and hence we have fort >0,

||µP2t₋π_||2=||(µ−π)P2t||2 ≤(σ1(ck+ 2σk))t||µ−π||2

The assumption that π is uniform is only for notational convenience. In the general case, we consider _S(P) whose left and right singular vectors corresponding to σ0= 1 are equal.

Theorem 4.11. Let P denote the walk on directed edges of a undirected d-regular graph X

considered before. Let Q denote the standard random walk on X and r(Q) its relaxation time. Then

R2(P2)≤r(Q)(2 + log(nd))

where n denotes the number of vertices ofX and d_≥2.

Proof. This is an application of Theorem 4.10. We calculate the Singular Value Decomposition ofPand show that for a suitablek,σ1= 1, ck=λ∗, σk= 0, whereλ∗ =λ∗(Q). Thus after every two applications of Pthe Euclidean distance to stationarity contracts byλ_∗. Adversarial moves cannot increase the distance.

Since the stationary distribution is uniform, the Euclidean distance is proportional to theL2(π)

distance and the same conclusion holds for the L2(π) distance as well. Thus after 2r(Q) the

distance to stationarity reduces by a factor of 1/e. Since the worst initial value is√nd, we have the result.

For each vertexuofX, letL−

u denote the span of vectors corresponding to edges coming into u and L+

u denote the span of vectors corresponding to edges leavingu. Also, letx−u ∈L−u denote the vector which is 1 for edges coming intou and 0 elsewhere. Similarly definex+_u _∈L+_u. Observe that thend-dimensional space has the decomposition

⊕uL−u =⊕vL+v

and also thatP maps L−

(28)

Since this holds for every vertex u, it follows that P has singular values 1 with multiplicity n, and 0 with multiplicityn(d₋1).

LetE denote the span of thenleft singular vectors corresponding to the singular value 1. Note thatE has orthonormal basis_{x_√−u

d}u and includes the stationary distribution. Note that image of E underP has orthonormal basis_{x_√+u

d}u. For arbitrary verticesu, v of X, we have

hx

ThusPconsidered as an operator onEbehaves exactly likeQ. Now takek=nin Theorem 4.10 and observe thatE =En⊕< π >, where< π >denotes the linear span of the uniform stationary distribution of Q. Since < π > is invariant underP, it follows that cn=λ∗(Q).

Thus Theorem 4.10 implies that for any initial distribution µ, two successive applications of P reduces the euclidean norm ofµ₋π by a factorσ1(cn+ 2σn) =λ∗(Q).

We conclude this section, with another example similar to the “walk on directed edges” example.

Example 11. “walk without immediate reversal”: LetX be a connected d-regular undi-rected graph. Assume that the usual random walk on X is ergodic. Define a walkPas follows: If we are currently at vertexu, we pick a neighbor uniformly at random, except the vertex from which we came.

Like the “walk on directed edges” example, the states of this walk are directed edges (u, v) ofX. From (u, v) we move to (v, w) where w is a uniformly chosen neighbor ofv exceptu.

The adversarial strategy of reversing the orientation of an edge, ensures that this walk has R(P) = _∞, since as before P maps the uniform distribution on edges into a vertex u to the uniform distribution on edges out ofu.

The singular vectors of P are the same as that for the walk on directed edges, although the singular values are different. As in the “walk on directed edges” example, define subspacesL− u and Lv+. The matrix for P as an map from L−u to L+u is the d×d-matrix (Jd−I)/(d−1), whereJdis the all ones matrix. As before, (Jd−I)/(d−1) is symmetric with eigenvalues 1 and

−1/(d₋1) (multiplicityd₋1).

Since this holds for all vertices u, P has singular values 1 with multiplicity n and 1/(d₋1) with multiplicitynd₋d. A proof similar to Theorem 4.11 allows us to prove a decay factor of λ_∗(Q) + 2/(d₋1) where Q corresponds to the natural walk on X. Thus if X is a sufficiently good expander and the degree is large enough, in the sense thatλ_∗(Q) + 2/(d₋1) <1, the walk without immediate reversal is also rapidly mixing.

5 Cayley walks on groups

(29)

5.1 Walks against restricted adversaries

It is well known that the knowledge of all the singular values of the transition matrix P can give better bounds on the standard mixing time than those obtained just from the knowledge of σ1(P). In this section we show that the same conclusion holds for the robust mixing time

against holomorphic adversaries.

Definition 5.1. For a groupG, theholomorph ofG, denoted Hol(G) is the semi-direct product of Gwith Aut(G), where Aut(G) acts naturally on G.

Hol(G) =G⋊Aut(G) (24)

Elements of Hol(G) are called holomorphisms of G.

Holomorphisms ofGcan be identified with permutations of Gas follows: Elements of Gact on Gby right translation and those of Aut(G) act naturally, i.e.

(h, τ)_·g=τ(g)h (25)

Lemma 5.2. The permutation representation of Hol(G) is faithful and transitive.

Proof. Transitivity is easily established by by setting τ to be the identity automorphism and choosing h appropriately. To see that this representation is faithful, suppose that for some (h, τ)_∈Hol(G), we have

τ(g)h=g_∀g_∈G (26)

Choosingg=id, we see that h=idwhich in turn implies τ is the identity automorphism.

Hence we can identify holomorphisms of G by the permutations they induce on G(which was how we defined them in the introduction).

Definition 5.3. A permutation J :G_→G is said to be G-respecting if for some permutation K:G_→G, and allg, h_∈G,

J(h)−1J(g) =K(h−1g)

Note that if K=J, then we get the definition of an automorphism of G.

Lemma 5.4. A permutation J of Gis G-respecting iff it is a holomorphism ofG.

Proof. Let _G denote the set of all G-respecting permutations on G. Hol(G) is closed under composition. Observe that_G is also closed under composition, since

J1(J2(h))−1J1(J2(g)) =K1(J2(h)−1J2(g)) =K1(K2(h−1g)

whereK1, K2 correspond toJ1 andJ2.

(30)

For a G-respecting permutation J, define J′(g) = g_·(J(id))−1. Then J′ _∈ Hol(G) since it is a right translation. Since J and J′ _are _G_-respecting, _J′′₍_g_{) =} _J′₍_J₍_g_{)) is also a} _G_-respecting permutation, asG-respecting permutations are closed under composition. But

J′′(id) =J′(J(id)) =J(id)_·J(id)−1=id (27)

Hence J′′ _{is a} _G_{-respecting permutation which fixes the identity. Hence from the definition of} G-respecting, substitutingh=id, there is a permutationK′′ _of _G_{for which}

J′′(id)−1J′′(g) =K′′(id−1g) (28)

Since J′′₍_id_{) =} _id_{, we have} _K′′₍_g_{) =} _J′′₍_g_{). Substituting this back in the definition of} _G -respecting, we have for all g, h_∈G,

J′′(h)−1J′′(g) =J′′(h−1g) (29)

i.e. J′′ _{is an automorphism of}_G_{. For all}_g_∈_G_{, we now have}

J(g) =J′′(g)_·J(id) = (J(id), J′′)_·g (30)

HenceJ _∈Hol(G) and _{G ≤}Hol(G).

Note that the holomorphic strategies _H are precisely convex combinations of Hol(G) (viewed as permutations on G) and Cayley strategies _C are precisely the convex combinations of the subgroupGof Hol(G) viewed as permutations onG.

We now look at the holomorphic robust mixing time of the Cayley walk on G. We identify a permutation onGwith the _|G_{| × |}G_|permutation matrix representing it.

Let P be the transition probability matrix of the Cayley walk on a group G. Fix a sequence

{A_t_}_t>₀ of holomorphic strategies and define

Q₀ =I Q_k₊₁=Q_kPA_k₊₁

Ifµt denotes the distribution aftertrounds we haveµt=µ0Qt, whereµ0 is the initial

distribu-tion.

Lemma 5.5. If µ0 is supported only at g ∈ G then ||µt−π||22 = (QtQtT)(g, g)−1/N, where

N =_|G_|and _{|| · ||}2 denotes the Euclidean norm.

Proof.

||Q_t(g,_·)₋π_||2₂=X h

(Q_t(g, h)₋1/N)2

=X h

Q_t(g, h)Q_t(g, h)₋2Q_t(g, h)/N + 1/N2

=X h

(Q_t(g, h)Q_t(g, h))₋1/N