Torpid mixing of some Monte Carlo Markov

(1)

Torpid Mixing of Some Monte Carlo Markov Chain Algorithms in Statistical

Physics

Christian Borgs

∗

Jennifer T. Chayes

∗

Alan Frieze

†

Jeong Han Kim

∗

Prasad Tetali

‡

Eric Vigoda

§

Van Ha Vu

¶

Abstract

We study two widely used algorithms, Glauber dynam-ics and the Swendsen-Wang algorithm, on rectangular sub-sets of the hypercubic latticeZd. We prove that under cer-tain circumstances, the mixing time in a box of side length L with periodic boundary conditions can be exponential inLd−1_{. In other words, under these circumstances, the} mixing in these widely used algorithms is not rapid; in-stead, it istorpid. The models we study are the indepen-dent set model and theq-state Potts model. For both mod-els, we prove that Glauber dynamics is torpid in the region with phase coexistence. For the Potts model, we prove that Swendsen-Wang is torpid at the phase transition point.

1 Introduction

Monte Carlo Markov chains (MCMC) are used in com-puter science to design algorithms for estimating the size of large combinatorially defined structures. In statistical physics, they are used to study the behavior of idealized models of physical systems in equilibrium. In the latter case, the models of interest are usually defined on regular, finite-dimensional structures such as the hypercubic lattice

Zd_{. In both applications, it is necessary to run the chain,}

M, until it is close enough to its steady state. Thus it is im-portant to designrapidly mixingalgorithms, i.e. algorithms

∗_{Microsoft Research, 1 Microsoft Way, Redmond, WA 98052,}

borgs@microsoft.com, jchayes@microsoft.com, jehkim@microsoft.com

†_{Department of Mathematics, Carnegie Mellon U., Pittsburgh, PA}

15213, alan@random.math.cmu.edu; research supported in part by the NSF Grant No. CCR–9530974, and by a Guggenheim fellowship

‡_{School of Mathematics, Georgia Tech, Atlanta, GA 30332-0160,}

tetali@math.gatech.edu; research supported in part by the NSF Grant No. DMS–9800351

§_Department _of _Computer _Science, _Berkeley, _CA, _94720,

vigoda@cs.berkeley.edu; research supported in part by a GAANN graduate fellowship and NSF grant CCR–9505448

¶_{Institute for Advanced Study, Princeton, Olden Lane, NJ 08540,}

van-havu@math.ias.edu

for which themixing time,τM, is small.

In this paper, we study two statistical physics models, the q-state Potts model and the independent set problem. We consider these models on the graphs on which they are most often studied in physical applications, namely on sub-sets of Zd_{. For the Potts model, we study two types of}

Monte Carlo Markov chains – Glauber dynamics, and the empirically more rapid Swendsen-Wang dynamics.

Both the Potts model and the independent set model are characterized by non-negative parameters, the former by a so-called inverse temperatureβ, and the latter by a so-called fugacityλ(see definitions below). Both models are known to undergo phase transitions from a so-called disordered phase with a unique equilibrium state to an ordered phase with multiple equilibrium states. Due to the multiple states, the ordered phase is also known as the region of phase co-existence.

The point of this work is to relate the mixing times of the MCMCs to the phase structures of the underlying equilib-rium models. In particular, we show that Glauber dynamics is slow, ortorpid, for both models in their regions of phase coexistence, while Swendsen-Wang for the Potts model is torpid at the phase transition point. This latter result has ap-parently come as a surprise to some physicists who use the Swendsen-Wang algorithm to simulate the Potts model, and who have tacitly assumed that it mixes rapidly for all values of the inverse temperature.

In addition to this “physically surprising” result, our work is new in a number of respects. While there has re-cently been a good deal of work in the theoretical CS com-munity on slowness of Swendsen-Wang dynamics for the Potts model (see citations below), this is one of the first works to consider the physically relevant case of the hy-percubic lattice Γd = Zd and finite portions thereof. (In

(2)

statistical physics expansion techniques for the problem of controlling the number of cutsets in graphical expansions of these models. Specifically, we use so-called Pirogov-Sinai theory [21] from the statistical physics literature, in the form adapted to the Potts model by Borgs, Koteck ´y and Miracle-Sole ([4], [5]). We also use the new and power-ful combinatoric estimates of Lebowitz and Mazel [17] for controlling the number of cutsets. Finally, we use the lovely isoperimetric inequalities of Bollob´as and Leader [2].

In this introduction, we will first describe our work on MCMC for the Potts model, and then for the independent set problem.

Theq-state Potts Model (see [26], [27]) on an arbitrary graphG= (V, E),|V|=nis defined as follows: a coloring

σ _{is a map from}_V _→ _[_q_{] =} _{1_,₂_{, . . . , q}_}_{. Let}_D₍σ₎_be

the set of edges with endpoints of a different color and let d(σ_{) =}|D(σ)|. The weight of a coloringw(σ₎_is_e−βd(σ)_.

We turn this into a probability distributionµby normalizing with thepartition functionZ = Pσw(σ).To study this

model empirically, one needs to be able to generateσwith probability (close to)

µ(σ_{) =}w(σ)

Z . (1)

The model is said to beferromagneticifβ ≥0, otherwise it isanti-ferromagnetic. Note thatβ =−∞corresponds to randompropercolorings.

The widely used Swendsen-Wang algorithm [24] for the ferromagnetic model uses a Markov chain with state space

[q]V _{which has steady state} _µ_{– see Section 3. Gore and} Jerrum [13] proved that on the complete graph Kn with q ≥ 3, there is a certain value ofβ (inverse temperature) such that the mixing time of the algorithm is exponential inn. Jerrum [15] has coined the phrasetorpidmixing to describe this phenomenon. Cooper and Frieze [7] extended their arguments to show that in the Potts model on the ran-dom graphGn,p, this phenomenon persists with high prob-ability forp= Ω(n−1/3₎_{. Li and Sokal [18] proved a linear} (in the number ofsites) lower bound for finite boxes inZd. (For positive results on this algorithm see [7], Cooper, Dyer, Frieze and Rue [6], Huber [14], Martinelli [20].)

Our first result concerns this algorithm and the simpler

Glauber dynamics – see Section 3. Let T = TL,d =

(Z_/LZ₎d_{be the}_{d-dimensional torus of side}_{L. We view}

this as a graph where two points are connected by an edge if they differ by 1 (modL) in one component. It has vertex setV =VL,dand edge setE =EL,d. Using the results of Borgs, Koteck´y and Miracle-Sol´e [5], we prove the follow-ing negative result:

Theorem 1 Ford≥2and sufficiently largeq, there exists βc=βc(q, d)such that:

(a)The mixing timeτSW of the Swendsen-Wang algorithm

onTL,datβcsatisfies

τSW ≥eK1L/(logL)

2

for some absolute constantK1>0.

(b) The mixing time τGD of the Glauber dynamics for β ≥ βcsatisfies

τGD ≥eK2L/(logL)

2

for some absolute constantK2>0.

For an arbitrary graphG= (V, E), an independent set is a set of verticesI⊂V such that no pair of verticesi, j∈I is incident to the same edgee ∈E. Dyer, Frieze and Jer-rum [9] considered the problem of generating a nearly ran-dom independent set of a bipartite graph. They prove that Glauber dynamics exhibits torpid mixing on almost all reg-ular graphs of degree 6 or more and that the problem is NP-hard for regular graphs of degree 25 or more. In statistical physics, the independent set problem is called thehard-core gasmodel. In general there is a parameterλ >0called the fugacityand one wants to generate independent setsIwith probability proportional toλ|I|_i.e.

µ(I) = λ

|I|

P

Jindependentλ|J|

. (2)

Our second result concerns this problem. The Glauber dy-namics chain is a simple chain on the independent sets of graph G that selects a random vertex and adds/deletes it to/from the current independent set with some probability dependent onλ– see Section 3. Dyer and Greenhill [10], Luby and Vigoda [19] have proved that this chain is rapidly mixing forλ < _∆2₋₂, where∆denotes the maximum de-gree ofG.

We also prove bounds on more general Markov chains. To define this class, let I, I′ _{be independent sets, and let}

D(I, I′_{) =}_|_I_\_I′_|₊_|_I′_\_I_|_{. For an ergodic Markov chain}

ML onTL,d, letDML be the maximum ofD(I, I′)over

allIandI′_{for which the transition probability is non-zero.}

We say thatMLis local ifDML is bounded uniformly in

L, and we say that it is ρ-quasi-local ifDML ≤ ρLd for

someρ < 1which is independent ofL.

Theorem 2 Ford≥2andλsufficiently large, the mixing timeτGDof the Glauber chain onTL,dsatisfies

τGD≥eK3L

d−1_/_(log_L₎2

for some constantK3 > 0depending only on the dimen-siond. More generally, letτL be the mixing time for any ergodic Markov chain on TL,d which is ρ-quasi-local for someρ < 1. Then

τL≥eK4L

d−1_/_(log_L₎2

(3)

Finally, we want to point out that our techniques for proving slow mixing are quite robust, and can be applied to many models exhibiting the phenomenon of phase coex-istence. All that we require is that the equilibrium model have energy barriers between different phases that is high enough to apply the techniques of [3], and that the dynam-ics is not sufficiently global to permit jumps over these bar-riers. (An example of a Markov chain which “jumps over” energy barriers is the Swendsen-Wang algorithm at temper-atures below the transition temperature.)

2 Mixing Time

LetMbe an ergodic Markov chain on a finite state space

Ω, with transition probabilitiesP(x, y), x, y ∈ Ω. Letπ denote the stationary distribution ofM.

Let x ∈ Ω be an arbitrary fixed state, and denote by Pt,x(ω)the probability that the system is in stateωat time tgiven thatxis the initial state.

Thevariation distance∆(π1, π2)between two distribu-tionsπ1, π2onΩis defined by

∆(π1, π2) = max

S⊆Ω |π1(S)−π2(S)|

= 1₂X

ω∈Ω

|π1(ω)−π2(ω)|.

The variation distance at time t with respect to the initial statexis then defined as

∆x(t) = ∆(Pt,x, π).

We define the functiond(t) = maxx∈Ω∆x(t)and the mix-ing time

τ= min{t: 2d(t)≤e−1_}_.

A property ofd(t)given in [1] is thatd(s+t)≤2d(s)d(t), implying in particular thatd(t)≤exp(−⌊t/τ⌋). It is there-fore both necessary and sufficient that chains be run for some multiple of mixing time in order to get a sample which is close to a sample from the steady state.

For our purposes, the Swendsen-Wang algorithm is rapidly mixingif its mixing timeτSW is bounded by a poly-nomial inn, the number of vertices ofG. Similarly for the Glauber chain.

Jerrum and Sinclair [23] introduced the notion of con-ductance to the study of finite time reversible Markov chains. A chain is reversible if it satisfies thedetailed bal-anceequations:

π(x)P(x, y) =π(y)P(y, x), for allx, y∈Ω.

Putting Q(x, y) = π(x)P(x, y) and Q(A, B) =

P

(x,y)∈A×BQ(x, y), we define the conductance of a set

of states∅ 6=S⊂Ωas

ΦS =

Q(S,S¯)

π(S)π( ¯S) whereS¯= Ω\S. (3)

The conductance ΦM of the chain itself is simply

minS6=∅ΦS. We prove our lower bounds on mixing time by showing thatΦMis small and then using the well-known

bound [1]

e−1/τM_≥₁₋_Φ

M. (4)

3 MCMC Algorithms

There are several MCMC algorithms that are used to generate a random sample from these distributions corre-sponding to the hard-core model and ferromagnetic Potts model. The Glauber dynamics is perhaps the simplest such Markov chain. Its transitions are as follows: Choose a ver-tex at random, and modify the spin of that verver-tex by choos-ing from the distribution conditional on the spins of the other vertices remaining the same. We will detail the al-gorithm for the hard-core model on independent sets.

Glauber Dynamics:From an independent setσ,

G1 Choosevuniformly at random fromV.

G2 Let

σ′=

(

σ∪ {v} with probabilityλ/(1 +λ)

σ\ {v} with probability1/(1 +λ).

G3 Ifσ′is an independent set, then move toσ′, otherwise stay at the current independent setσ.

For the ferromagnetic Potts model, an alternative method, the Swendsen-Wang process [24], is often pre-ferred over other dynamics in Markov chain Monte Carlo simulations.

Swendsen-Wang Algorithm:

SW1 LetB =E\D(σ₎_{be the set of edges joining}

ver-tices of the same color. Delete each edge ofB inde-pendently with probability1−p, wherep= 1−e−β_. This gives a subsetAofB.

SW2 The graph(V, A)consists of connected components. For each component a colour (spin) is chosen uni-formly at random from[q]and all vertices within the component are assigned that colour (spin).

(4)

Given a graphG = (V, E), letG(A) = (V, A)denote the subgraph ofGinduced by the edge setA⊆ E. In the random cluster model,G(A)is the measure given by

µ(G(A)) = 1

Z p

|A|₍₁₋_p₎|E|−|A|_qc(A)_,

(5)

wherec(A)is the number of components ofG(A)andpis a probability.

The relationship between the two models is elucidated in a paper by Edwards and Sokal [11]. The Potts and random cluster models are defined on a joint probability space[q]n_×

2E_{. The joint probability}_π₍_σ_{, A}₎_{is defined by}

π(σ_{, A}_{) =} 1

Z

Y

(i,j)∈E

((1−p)δ(i,j)6∈A+pδ(i,j)∈Aδσi=σj),

(6) where Z is a normalizing constant. By summing overσ

orAwe see that the marginal distributions are correct, and (remarkably) the normalising constants in both Potts and Cluster models are the value ofZ given in the expression above.

The Swendsen-Wang algorithm can be seen as givenσ_,

(i) choose a randomA′_{according to}_π₍_σ_{, A}′₎_{and then (ii)}

choose a randomσ′_{according to}_π₍σ′_{, A}′₎_.

After Step SW1 we say that we are in the FK represen-tation of the chain.

4 Minimal Cutsets

Let G = (V, E)be a connected graph. For W ⊂ V we defineGW as the graph(W, EW), whereEW is the set of all edges inE that join two vertices inW. We say that C ⊂ W is a component ofW if C is the vertex set of a component ofGW. As usual, we define a subsetγ⊂Eto be a cutset if(V, E\γ)is disconnected. We defineγto be a minimal cutset if all cutsets contained inγare identical to γ. Ifγis minimal,(V, E\γ)has exactly two components. ForW ⊂ V, we letW denote the complement ofW, i.e.

W = V \W. We denote the set of edges between two

disjoint sets of verticesW andW′ by(W : W′). Finally, we useC(W)to denote the set of components ofW.

We consider the cutset∂W = (W :W)and decompose it as∂W =∪C∈C(W)∂C. We will further decompose∂C into minimal cutsets, see Lemma 1 below. In order to state the lemma, we introduce the sets

ΓC={(C:D)|D∈ C(C)}={(D:D)|D∈ C(C)}

and

Γ(W) = [

C∈C(W)

ΓC.

Lemma 1 ConsiderW ⊂V.

(a) Let C, C′ be different components of W. There ex-ist unique D ∈ C(C) and D′ ∈ C(C′₎_{such that} _D _⊆

D′or equivalentlyD′_⊆_D.

(b) ForC ∈ C(W),∂C has a unique decomposition into minimal cutsets as∂C =∪γ∈ΓCγ.

(c)Ifγ,γ′_∈_Γ(_W₎_{are distinct then they are disjoint.}

(d)LetCandC′ _{be two (not necessarily distinct)}

compo-nents ofW ⊂V. IfX orXis a component ofCandY or Y is a component ofC′_then

X∩Y =∅, X∩Y =∅, X∩Y =∅, orX∩Y =∅.

Proof:

(a)We will first prove uniqueness. SinceC∩C′ ₌_∅_,

C′_⊂_C_and_C_⊂_C′_{. Furthermore,}_C_{is connected. Hence,}

there exists a uniqueD′_{∈ C(}_C′₎_{such that}_C_⊂_D′_{. For all}

D∈ C(C),C⊂D. Therefore, if there exists aD′_{∈ C(}_C′₎

withD⊂D′_,_D′_{must be the unique component containing}

C. The uniqueness ofDis proved similarly. Next, we prove existence. Let D′ _{be as above, so that} _D′ _⊂ _{C. Since}

C ∪D′′ _{is connected for all} _D′′ _{∈ C(}_C′₎_{, the set} _D′ ₌

C′ _∪S_{_D′′ _{∈ C(}_C′_{) :} _D′′ ₆₌ _D′_} _{is connected. As a}

consequence,D′_⊂_C_{must lie in one of the components}_D

ofC.

(b)Obviously,∂C =∪γ∈ΓC is a decomposition of∂C

into minimal cutsets of G. To prove uniqueness, assume thatγ ⊂∂C is a minimal cutset ofG. Then there exists a D ∈ C(C)such that(D : D) ⊂γ. Otherwise,C∪Dis connected inG\γfor everyD∈ C(C), which would imply thatG\γis connected. Sinceγis minimal,(D: D)⊂γ implies(D:D) =γ.

(c)For cutsetsγandγ′_{corresponding to the same}

com-ponentC, disjointness follows from the explicit form given in (b). Assume thatγ∩γ′ ₆₌_∅_{for two different components}

CandC′_{. This would imply that}_∂C_∩_∂C′ ₆₌_∅_{, which in}

turn implies thatCandC′_{are connected in}_{G, and hence in}

GW. But this contradicts the assumption thatCandC′are different components ofGW.

(d) Without loss of generality X ∈ C(C) and Y ∈ C(C′₎_{. We consider several cases:}

• IfX =Y thenX∩Y =∅.

• IfC = C′ andX 6= Y, thenX andY are different components ofCwhich implies thatX∩Y =∅.

• If C 6= C′ then we use part (a) of this lemma. We condition on whetherXand/orY are the uniqueD∈ C(C)andD′∈ C(C′)such thatD⊆D′.

– X 6=D, Y 6= D′ :SinceY ⊂ D′ _{and part (a)}

implies thatX ⊂D⊂D′_{, so we have that}_X_∩

Y =∅.

– X 6= D, Y = D′_{: We saw in the previous case}

X ⊂D′ _{and thus}_X_∩_Y ₌ _∅_{. The case when}

(5)

– X = D, Y = D′: Since X ⊂ Y by part (a), X∩Y =∅.

✷

Letγ= (D:D)be a minimal cutset ofG, in particular DandDare connected. We then define Intγas the smaller (in terms of cardinality) of D and D. IfD andD have the same size, we can define Intγas eitherD or D. For definiteness, we define Intγ as the one containing a fixed pointxo∈V. For a cutsetγwe define Extγ=V\Intγ, and for a collectionΓof minimal cutsets, we define theinterior ofΓand thecommon exterior ofΓ as

IntΓ = [

γ∈Γ

IntΓ and ExtΓ = \

γ∈Γ Extγ.

Note that IntΓ∪ExtΓ =V for all setsΓof minimal cutsets.

Lemma 2 LetW ⊂V.

(a) Letγ, γ′ _∈ _Γ(_W₎_{. If}_Int_γ_∩_Int_γ′ ₆₌ _∅_{, then either}

Intγ⊂Intγ′_or_Int_γ′_⊂_Int_γ.

(b)EitherW orW is a subset ofIntΓ(W).

Proof:

(a)LetX =IntγandY =Intγ′_{, and assume without}

loss of generality thatX∩Y 6= ∅. Applying the previous lemma, we have three cases:

(i)X∩Y =∅, which is equivalent toX ⊂Y, (ii)X∩Y =∅, which is equivalent toY ⊂X, and (iii) X ∩Y = ∅ which is equivalent toX ⊂ Y. No-tice that|X| ≥ |V|/2which implies that|Y| ≥ |V|/2and

|Y| ≤ |V|/2. This contradicts the fact that|Y|=|Intγ′_{| ≤}

|Y|unless equality holds, i.e. unless|Y| = |Y| =|X| = |X|=|V|/2. Together withX ⊂Y, this impliesX =Y in contradiction to our assumptionX∩Y 6=∅.

(b)We consider two cases. Suppose that for everyC ∈ C(W)there is a cutset γ ∈ Γ(W)withC ⊂ Intγ. Then, clearly

W = [

C∈C(W)

C⊂ [

γ∈Γ(W) Intγ.

Suppose instead that there isC∈ C(W)such thatC6⊂Intγ for allγ. Then sinceCis a subset ofDfor every component DofC, the interior of the corresponding cutsetγD = (D:

D)must beD. ThusC = ∪_D_∈C₍_C₎IntγD. In particular,

sinceCis a component ofW,

W ⊂C⊂ [

γ∈Γ(W) Intγ.

✷

Next we specialize to the torus TL,d = (VL,d, EL,d). Consider a set W ⊂ VL,d and a fixed minimal cutset γ corresponding toW. Fore∈γwe define adual(d−1) -dimensional cubee∗ _{which is (i) orthogonal to}_e _{and (ii)}

bisectse, whenTL,dis considered as immersed in the con-tinuum torus (R_/Z₎d_{. (In dimension} _d _{= 3}_{, the}

two-dimensional dual cells are referred to as plaquettes). We define a graphΓ∗ = (γ∗, E∗)whereγ∗ ={e∗ : e ∈ γ}

and(e∗

1, e∗2)∈E∗iffe∗1∩e∗2is a cube of dimensiond−2. The components ofΓ∗_{are called the}_{co-components}_of_γ.

These co-components are connected hypersurfaces of dual

(d−1)-dimensional cells.

In the following, we will call cutsets with one co-component topologically trivial, and cutsets with more than one co-component topologically non-trivial. Small compo-nents which can be embedded inZd_{give rise to cutsets with} only a single co-component, which are therefore topolog-ically trivial. Topologtopolog-ically non-trivial cutsets arise from certain components which are large enough to “feel” the non-trivial topology of the torus. For example, the com-ponentC = {x ∈ VL,d | 1 ≤ x1 ≤L/2} gives rise to a cutset whose two co-connected components are two parallel interfaces, each of which has sizeLd−1.

Lemma 3 (a) Given a fixed edgee ∈ EL,d there are at mostνk_{, ν} _{= min{3}_{, d}64/d_}_{, distinct co-components}_γ_of sizekwithe∈γ.

(b)If a cutset is non-trivial, each of its co-components con-tains at leastLd−1_edges.

Proof:

(a)This follows from the observation that the proofs in [22] and [17] may be applied without changes to the torus.

(b) We need some notation. Consider a set of edges X and its dual X∗_{. Define the boundary}_∂X∗ _of _X∗ _as

the set of (d−2)-dimensional hypercubes which belong to an odd number of (d−1)-dimensional cells in X∗. If ∂X∗=∅, define theZ2winding vector ofX∗as the vector

N₍_X∗_{) = (}_{N1, . . . , N}_d₎_{, where}_N_i_{is the number of times}

X∗intersects an elementary loop in theith_{lattice direction}

mod2.

Let X be a cutset, X = (W : W), where W ⊂ V.

Let W ⊂ (R_/Z₎d _{be the union of all closed unit cubes}

with centerw ∈ W. ThenX∗_{is the boundary of the set}

W, and hence∂X∗ ₌_∅_{. Obviously, each elementary loop}

must leave and enter the setW the same number of times, implying that the winding vector ofX∗ _is₀_{. On the other}

hand, it is not difficult to prove that each set of edges X with∂X∗ ₌_∅_andN₍_X∗_{) = 0}_{is a cutset for some set of}

pointsW ⊂ V,X = (W : W). Indeed, the assumptions ∂X∗=∅andN₍_X∗_{) = 0}_{imply that every closed loop in}

TL,d intersectsX∗ an even number of times. Considering an arbitrary vertexw0∈V and the set of all “walks” of the form(w0, w1, . . . , wk),{wi, wi+1} ∈EL,d, we then define W as the set of points which can be reached fromw0by a walk which intersectsX∗_{an odd number of times.}

(6)

property is inherited by all its co-components, implying that ∂˜γ∗ = ∅. Obviously,N_(˜_γ∗₎_{is different from zero, since}

otherwiseγ˜would be a cutset itself, in contradiction to the assumption thatγis minimal. Letjbe a direction for which Nj(˜γ∗) 6= 0. Thenγ˜∗intersects any fundamental loop in thej-direction an odd number of times, giving thatγ˜∗

con-tains at leastLd−1_{dual (d}₋₁_{)-dimensional cells.}

✷

5 Independent Sets

In this section, we give a proof of Theorem 2. We start with some notation. For a bipartite graphG = (V, E)we arbitrarily call the vertices in one partition even, and those in the other partition odd. We writeVevenfor the set of even vertices inV, andVoddfor the set of odd vertices inV. We denote the collection of independent sets ofGbyΩ. LetI be an independent set inΩ. We then defineWodd(I)as the set of vertices in or adjacent to a vertex in the setI∩Vodd. Similarly Weven(I) is defined for I∩Veven. We define the setΓodd(I)as the set of minimal cutsets corresponding toWodd(I),Γodd(I) = Γ(Wodd(I)), and similarly for the setΓeven(I). Finally, for a cutset γ, we define V(γ) =

S

{x,y}∈γ{x, y}.

Lemma 4 (a)Ifγ∈Γodd(I), thenV(γ)∩I=∅.

(b)Forγ∈Γodd(I), the vertices in the setV(γ)∩Intγare either all even or all odd.

(c)Forγ∈Γodd(I), there exists an independent setIγsuch thatΓodd(Iγ) ={γ}.

(d)EitherI∩VoddorI∩Vevenis a subset ofIntΓodd(I).

Proof:

(a) We have to prove that {x, y} ∩I = ∅ whenever

{x, y} ∈γ ⊂∂Wodd(I). First notice that for an odd ver-texv,v ∈ Wodd(I) ⇔ v ∈ I, whereas ifv is even then v ∈ Wodd(I) ⇔v has a neighborw ∈ I. Suppose that x∈I, y6∈I. Ifxis odd thenx, y ∈Wodd(I). Ifxis even, thenx, y6∈Wodd(I). In either case, we have the contradic-tion that{x, y} 6∈∂Wodd(I).

(b)Ifγ ∈ Γodd(I), thenγ = (D : D) = (C : D)for some componentCofWodd(I)and some componentDof C. As a consequence, either(V(γ)∩Intγ)⊆Wodd(I), or

(V(γ)∩Intγ)⊆Wodd(I). If an odd vertexvis in the set Wodd(I)thenv∈Iandw∈Wodd(I)for all neighborsw ofv. Thus an odd vertexv ∈Wodd(I)cannot be incident to an edge in∂Wodd(I). As a consequence, the vertices of V(γ)∩Intγare even if(V(γ)∩Intγ)⊆Wodd(I)and odd otherwise.

(c)If the vertices of the setV(γ)∩Intγare even then let Iγ = (Vodd∩Intγ)∪(Veven∩Intγ). Otherwise, exchange the setsVoddandVevenin the definition ofIγ.

(d)Lemma 2 implies that either

Wodd(I)⊂IntΓodd(I) or Wodd(I)⊂IntΓodd(I).

SinceI∩Vodd ⊂Wodd(I)andI∩Veven ⊂Wodd(I), the

result follows. ✷

From now on, we specialize to the graph TL,d. For a vertex v = (v1, . . . , vd) ∈ V and a “direction” α ∈

{±1, . . . ,±d}, we define theshiftσα(v)as the vertex with coordinatesvi fori 6= |α|andvi+sign(α) (modL)for i=|α|, where sign(α) =α/|α|. For a cutsetγ∈Γodd(I), we defineγα={(v, w)|(v, w)∈γ, v∈Intγ, w=σα(v)}.

Lemma 5 For any cutsetγ∈Γodd(I)and any directionα,

|γα|=|γ|/2d.

Proof:We first prove the lemma ford= 2. Letγ∗ _{be the}

set of edges dual to the edges inγ. The setγ∗is a union of cycles, and each edge in the+1or−1 direction in any of these loops is followed by an edge in the+2or−2direction by Lemma 4 (b). We therefore have that |γi|+|γ−i| is independent of the directioni. Sinceγis a cutset,|γi|must be equal to |γ−i|, which implies the claim. For d > 2, we consider the intersection of Intγwith a two-dimensional planeS({ki}) ={x∈T |xi=ki, i /∈ {1,2}}. Since also the points in(V(γ)∩Intγ)∩S({ki})are all even or all odd, the above arguments can be applied to the intersection ofγ andS({ki}), implying that|γ1| = |γ−1| = |γ2| = |γ−2| since it is true for the intersection of these sets with any of the hyperplanesS({ki}). Applying this argument for an

arbitrary pair of directions, we get the lemma. ✷

The next lemma is a generalization of a lemma first proved by Dobrushin in [8].

Lemma 6 LetΓbe a set of minimal cutsets, and letΩΓ =

{I: Γ⊂Γodd(I)}. Then

µ ΩΓ≤λ− P

γ∈Γ(|γ|/2d)

Proof:We first note that it is enough to prove there exists an injective mapφΓ: ΩΓ →Ωsuch that

µ(I) =λ−P|γi|/2dµ(φΓ(I)).

Indeed, given such a map, we have

µ(ΩΓ) =λ− P

|γi|/2d_µ₍_φΓ_(Ω

Γ))≤λ− P

|γi|/2d_.

In order to construct such a mapφΓ, we introduce the partial orderγ ≤ γ′ _⇔ _Int_γ _⊂ _Int_γ′_{. We then observe}

that, by induction, it is enough to prove that for anyΓand anyγ ∈Γsuch thatγ is minimal inΓwith respect to the partial order, we have an injective mapφγ : ΩΓ →ΩΓ\{γ}

such thatµ(I) =λ−|γ|/2d_µ₍_φ γ(I)).

We will now construct such a map. ConsiderI ∈ ΩΓ. Letσ=σα. The proof holds for any choice ofα. Defining

(7)

we will have to show thatφγis an injection, thatI′ =φγ(I) is an independent set with µ(I′) = µ(I)λ|γ|/2d _{and that} I′∈ΩΓ\{γ}.

The first statement is obvious from the fact that the three setsI1=I∩Intγ,I2=σ(I∩Intγ)andI3=Intγ\σ(Intγ)

are pairwise disjoint (use Lemma 4 (a) to see thatI1andI2 are disjoint).

I1, I2are obviously independent and the independence of I3 follows fromI3 ⊆ V(γ)and Lemma 4(b). To then prove that I′ _{is an independent set, we use that, again by}

Lemma 4 (a), the setsI1∪I2andI1∪I3are independent sets. It remains to show thatI2∪I3is also an independent set. Considerv ∈ Intγ\σ(Intγ)andw ∈ σ(I∩Intγ). Thenv /∈σ(Intγ)and henceσ−α(v)∈/ Intγ. On the other hand,σ−α(w)∈I∩Intγ. Therefore,σ−α(v)andσ−α(w) cannot be adjacent by Lemma 4 (a), which implies thatv andwcannot be adjacent. I ∈ Ωwhich contain a set of odd trivial cutsets of sizes k1, . . . , kt. Then fora = min1≤i≤tki,b = max1≤i≤tki choose edgeseiin a certain fixed direction, e.g. direction 1, and then cutsetsγi∋ei. (Every cut set contains an edge in

(Note that it is safe to use the bound from Lemma 3(a) to bound the number of trivial cutsets, since for each trivial cutset, the dual is a single co-component.) Since

Pb

To prove the second statement, we use the previous lemma and the fact that each non-trivial cutset has at least two co-connected components to bound

nt goes over minimal cutsets withk

co-components. Using Lemma 3, and the fact that there are at mostLkd_{possibilities for the}_k_{starting edges for the}_k co-components ofγnt(k), we conclude that

which concludes the proof of the second statement. ✷

Lemma 8 Let0< α <1, and let

for some constantcαdepending onαandd.

(8)

wherec∗_α = 6α/π2_{. Thus}_I _{is in}_Ω(_{k1, ..., k}

i. This together with Lemma 7 gives

µ(Ωα)≤

We show next that if I is chosen from the probability distribution (2), then|I|is unlikely to be small.

Lemma 9 Let0< δ <1. Then

Proof:There are at most2Ld _{independent subsets in}_T L,d also small enough. If none of the three events whose proba-bilities we discuss above occurs, then|I|>(1−δ)Ld_/₂_and true if we generalize from Glauber dynamics to an ergodic Markov chain that isρ-quasi-local. (See the paragraph be-fore Theorem 2 for the definition ofρ-quasi-local.) To com-plete our proof by estimatingΦS (see (3)) forS = Ωodd,

6 Swendsen-Wang

Algorithm

on

a

d

-dimensional Torus

In this section we combine the methods and results of [5] and [4] with those of the last section to prove Theorem 1.

(9)

to undergo a phase transition as the inverse temperature,β, passes through a certain critical temperatureβc=βc(q, d). To make this statement precise, we introduce finite-volume distributions with boundary conditions. We consider the graph G = (V, E), where V = [L]d _and_E _{consists of} all pairs of vertices in V whose coordinates differ by 1

in one coordinate. We say that a vertex lies in the (in-ner) boundary ofV if one of its coordinates is either1 or L. For a coloringσ _of_V_{, we then introduce the weights}

wL,k(σ) =e−βd(σ)+βnk(σ), where nk(σ)is the number of vertices in the boundary ofV that have color k. With ZL,k =PσwL,k(σ), the finite-volume distributionsµL,k with boundary conditionkare then defined asµL,k(σ) = wL,k(σ)/ZL,k, and the spontaneous magnetizationm∗(β) is defined as theL→ ∞limit of the finite-volume magne-tizationsmL(β) =L−dP_x_∈_V(µL,1(σx= 1)−1/q). The above-mentioned phase transition can then be characterized as a transition between a high-temperature, disordered re-gionβ < βc where the spontaneous magnetization is zero, and a low-temperature, ordered regionβ > βc where the spontaneous magnetization is positive.

As a first step towards proving Theorem 1, we define the contours corresponding to a configurationA∈Ω = 2E_{. To} this end, we embed the vertex setV of the torusT = (V, E)

into the setV_{= (}R_/₍_LZ₎₎d_{. For a set}_X _⊂V_{, we define}

its diameter diam(X) = infy∈Vsup_x_∈_Xdist(x, y), where dist(x, y)is theℓ∞-distance between the two pointsxand

y in the torusV_{. For an edge}_e ₌ _{_{x, y}_{} ∈} _{E, let}e_be

the set of points in V_{that lie on the line between} _x_and

y. Given A, we call a closed k-dimensional unit hyper-cube c ⊂ V _{with vertices in} _V _occupied _{if all edges} _e

withe _⊂ _c _{are in}_{A. We then define the set}V₍_A₎ _⊂ V

as the 1/3-neighbourhood of the union of all occupied k-dimensional hypercubes,k= 1, . . . , d, i.e.,V₍_A_{) =}_{_x_∈ V _: _∃_c_{occupied, such that dist}₍_{x, c}₎ _< ₁_/_3}_{, and the set}

V(A)as the intersection ofV₍_A₎_{with the vertex set}_V _of

the discrete torusT. Note thatV(A) = S_{_x,y_}∈_A{x, y}. The setΓ(A)of contours corresponding to a configuration A∈Ωare then the components of the boundary ofV₍_A₎_.

Following [5], we decompose the set of configurations

Ωinto three setsΩord,Ωdis andΩBig. To this end, we de-fine a contour γ to be small if diam(γ) ≤ L/3. The set

ΩBigis then just the set of configurationsA∈Ωfor which

Γ(A)contains at least one contour that is not small. Next, restricting ourselves to small contoursγ, we define the set

Extγas the larger of the two components ofV_\_{γ, the set}

Extγas the intersection ofExtγwithV, and the set Intγ asV \Extγ. ForA∈Ω\ΩBig, let IntA=S_γ_∈_Γ(_A₎Intγ and ExtA=V\IntA. The setsΩord,ΩdisandΩBigare then

defined as

ΩBig ={A⊂E : ∃γ∈Γ(A)such that diam(γ)> L/3}

Ωord ={A⊂E : diam(γ)≤L/3∀γ∈Γ(A)

andV(A)∩ExtA6=∅} Ωdis ={A⊂E : diam(γ)≤L/3∀γ∈Γ(A)

andV(A)∩ExtA=∅}.

Lemma 11 LetA∈Ωord, and letAExtA ={b∈E : b⊂ ExtA}. Then

(a)ExtA=V(A)∩ExtA6=∅, and

(b)(ExtA, AExtA)is connected.

Proof:

(a) Proceeding as in the proof of Lemma 2 (b), we obtain that eitherV(A)⊂IntAorV(A)⊂IntA. SinceA∈Ωord, we conclude that the latter is the case, which is equivalent to the statement that ExtA=V(A)∩ExtA.

(b) The proof of this statement, which is implicit in [5], is straightforward but tedious. We leave it to the reader. ✷

In the next lemma we summarize some of the results of [5] used in this paper. We need some notation. LetA ∈ Ω\ΩBig, and letγ ∈ Γ(A). We say thatγis an exterior contour in Γ(A)if γ ⊂ Extγ′ _{for all}_γ′ _∈ _Γ(_A₎_{\ {}_γ_}_,

and denote byΓext(A)the set of exterior contours inΓ(A). Also, we define the sizekγkof a contourγas the number of timesγintersects the setS_e_∈_Ee_{. In order to motivate}

this definition, assume for a moment that the definition of the set V₍_A₎_{had involved an} _{ǫ-neighborhood, instead of}

the 1/3-neighborhood used above. With such a definition, the(d−1)-dimensional area of a contourγwould actually converge tokγkasǫ→1/2.

Lemma 12 For alld ≥ 2there are constantsc > 0 and q0<∞such that the following statements hold forq≥q0.

(a)βc= logq/d+O(q−c).

(b)For allβ >0,

µ(ΩBig)≤q−cL.

(c)Ifβ =βc, then

µ(Ωord) = q

q+ 1+O(q

−cL₎_, _and

µ(Ωdis) = 1

q+ 1+O(q

−cL₎_.

(d)Ifβ ≥βc, then

µ(Ωord)≥ q

q+ 1 +O(q

−cL₎_.

(e)Ifβ ≥βcandΓis a set of contours, then

(10)

Observing that for A ∈ Ω\ ΩBig, the set ExtA can be written asS_γ_∈_ΓextExtγ, which in turn implies that IntA= T

γ∈ΓextIntγ, we can now continue as in Section 5 to prove

an analog of Lemma 8. Defining

Ω(_ordα)={A∈Ωord: |{b∈A:b⊂ExtA}| ≥(1−α)dLd_}_, ΦSW of the Swendsen-Wang chain can then be estimated as follows:

HereAis chosen according to the measureµdefined in (5) andA′_{is constructed from}_A_{by one step of the}

Swendsen-Wang algorithm. We have

edges are deleted in Step (SW1) of Swendsen-Wang. But the number of edges deleted is dominated by the binomial B(dLd_,₁₋_p

we use the fact that both the measure (1) (denotedµbin this section) and the measure (5) (denotedµin this section) are marginals of the Edwards-Sokal measure (6). Thus

b

vertices in ExtAhave the same color by Lemma 11 and the definition (6) ofπ, we have that

(11)

for some constantc∗ depending onq andα. As a conse-quence,

π(Ωb(disα)|A)≥1−O(e−

c∗_Ld

)ifA∈Ω(disα). (10)

Combining (8) – (10) with Lemma 13 and the fact that

b

Ω_dis(α)∩Ωb_ord(α)=∅ifαis chosen small enough, we then get

b

µ(Ωb(kα)) =

1

qbµ(Ωb (α)

ord)

= 1

qµ(Ω (α)

ord) +O(e

−c∗_Ld

) +O(q−cL₎

+O(q−cαLd−1/(logL)2₎_,

b

µ(Ωb(disα)) = µ(Ω

(α)

dis ) +O(e−

c∗_Ld

) +O(q−cL) +O(q−cαLd−1/(logL)2₎_,

b

µ(Ωb(Restα)) = O(e−

c∗_Ld

) +O(q−cL) +O(q−cαLd−1/(logL)2₎_.

We complete our proof by estimating ΦS (see (3)) for S =Ωb(1α). First noticeµb(S)µb(S)≥ (1−1/q)/2q.Since the heat bath algorithm can only change one vertex at a time, it does not make transitions between the different setsΩb(_kα), nor does it make transitions betweenΩb(1α)andΩ

(α)

dis . Thus

Q(S, S) = X

I∈Ωb(1α),J∈Ω (α)

Rest b

µ(I)P(I, J)

= X

I∈Ωb(₁α),J∈Ω(Restα) b

µ(J)P(J, I)

≤ _bµ(Ω(Restα)).

The theorem now follows. ✷

Acknowledgements: The authors wish to thank Marek Biskup, Michael Freedman and Roman Koteck ´y for numer-ous helpful discussions. This work began at Microsoft Re-search, and four of the authors (A.F., P.T., E.V. and V.V.) would like to thank Microsoft Research, especially the the-ory group, for providing this opportunity.

References

[1] D. Aldous and J. Fill, Reversible Markov Chains and Random Walks on Graphs, in preparation. Some chapters available at

http://stat-www.berkeley.edu/pub/users/aldous/book.html.

[2] B. Bollob´as and I. Leader, Edge-isoperimetric in-equalities in the grid, Combinatorica 11 (1991) 299-314.

[3] C. Borgs and J. Imbrie,A unified approach to phase di-agrams in field theory and statistical mechanics, Com-mun. Math. Phys. 123 (1989) 305-328.

[4] C. Borgs and R. Koteck ´y,A rigorous theory of finite-size scaling at first-order transitions, Jour. Statist. Phys. 61 (1990) 79–110.

[5] C. Borgs, R. Koteck ´y and S. Miracle-Sol´e,Finite-size scaling for Potts models, Jour. Statist. Phys. 62 (1991) 529–551.

[6] C. Cooper, M. Dyer, A. M. Frieze and R. Rue,Mixing properties of the Swendsen-Wang process on classes of graphs II, in preparation.

[7] C. Cooper and A. M. Frieze,Mixing properties of the Swendsen-Wang process on classes of graphs, to ap-pear in Proceedings of DIMACS Workshop on Statis-tical Physics Methods in Discrete Probability, Combi-natorics and Theoretical Computer Science.

[8] R. L. Dobrushin,An investigation of Gibbs states for three-dimensional lattice systems (Russian with En-glish summary), Teor. Verojatnost. i Primenen. 18 (1973) 261–279.

English translation: Theor. Probability Appl. 18 (1974) 253–271.

[9] M. E. Dyer, A. M. Frieze and M. R. Jerrum,On count-ing independent sets in sparse graphs, preprint.

[10] M. E. Dyer and C. Greenhill, On Markov chains for independent sets, preprint.

[11] R. G. Edwards and A. D. Sokal Generalizations of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm.Phys. Rev. D 38 (1988) 2009-2012.

[12] C. Fortuin and P. Kasteleyn. On the random cluster model I: Introduction and relation to other models. Physica 57 (1972) 536-564.

[13] V. Gore and M. Jerrum.The Swendsen-Wang process does not always mix rapidly.Proceedings of the 29th Annual ACM Symposium on Theory of Computing, (1997) 674-681.

[14] M. Huber, Efficient exact sampling from the Ising model using Swendsen-Wang, Proceedings of the Symposium on Discrete Algorithms, (1999).

[15] M. R. Jerrum, Talk given at Workshop on Randomised Approximation and Stochastic Simulation in War-wick, England, 1998.

(12)

[17] J. L. Lebowitz and A. E. Mazel, Improved Peierls argument for higher dimensional Ising models, Jour. Statist. Phys. 90 (1998) 1051-1059.

[18] X-J. Li and A. D. Sokal,Rigorous lower bound on the dynamic critical exponents of the Swendsen-Wang al-gorithm, Phys. Rev. Lett. 63 (1989) 827-830.

[19] M. Luby and E. Vigoda, Fast convergence of the Glauber dynamics for sampling independent sets, to appear in Proceedings of DIMACS Workshop on Sta-tistical Physics Methods in Discrete Probability, Com-binatorics and Theoretical Computer Science.

[20] F. Martinelli,Dynamical Analysis of the low tempera-ture cluster algorithms, Jour. Statist. Phys., 66 (1992) 1245-1276.

[21] S. A. Pirogov and Ya.G. Sinai, Phase diagrams of classical lattice systems, Theor. Math. Phys. 25 (1975) 1185–1192; Phase diagrams of classical lattice sys-tems. Continuation, Theor. Math. Phys. 26 (1976) 39– 49.

[22] D. Ruelle, Statistical Mechanics: Rigorous Results, W. A. Benjamin, 1969.

[23] A. J. Sinclair and M. R. Jerrum, Approximate count-ing, uniform generation and rapidly mixing Markov chains, Information and Computation 82 (1989), 93-133.

[24] R. Swendsen and J-S. Wang, Non-universal critical dynamics in Monte-Carlo simulation.Phys. Rev. Lett. 58 (1987) 86-88.

[25] L. Thomas,Bound on the mass gap for finite volume stochastic Ising models at low temperature, Commun. Math. Phys. 126 (1989) 1-11.

[26] D. J. A. Welsh, Complexity: Knots, Colourings and Counting, London Mathematical Society Lecture Note Series 186, 1993.

[27] F. Y. Wu,The Potts model, Rev. Mod. Phys. 54 (1982) 235-268.