Special Case II - Special Cases - On Achievable Rate Regions for Source Coding over Networks

5.5 Special Cases

5.5.2 Special Case II

Enc Dec

Enc 2 Dec 2 (f)

Xb R₁ =R₃(g₂)

R₄(g₁) R₂

X, Y

Figure 5.6: Special case II.

As a second special case, let X₁ = C₁ with probability 1 and set R₁ = R₃, but allowX₂to be an arbitrary finite-alphabet random variable. The result is the network shown in Figure 5.6. In this network, there are one middle node at the top link and a direct link from the transmitter to the receiver. Let (D₁, D_Y) be the distortion requirements for reproducing (X, Y). Our purpose is to show that a rate vector (R₁, R₃, R₄) is in the rate distortion region if and only if

R₁ =R₃ ≥R_Y_|U(D_Y)

R₂ ≥R_X|U(D₁) +I(X, Y;U) R₄ ≥I(X, Y;U)

for some random variable U. To prove the converse, from Theorem 5.4.1, it suffices to check that R₁ ≥ R_Y_|U(D_Y). Let f, g₁, and g₂ be the encoding functions as indicated in Figure 5.6 and let F = f(Xⁿ, Yⁿ), G₁ = g₁(F), and g₂ = G₂(Xⁿ, Yⁿ) denote the encoding messages. Then, following the approach used in (5.11), we have H(Ybⁿ|G₁)≥nR_Y_|U(D_Y), where Ybⁿ is the reproduction of Yⁿ. Hence

nR₄ ≥H(G₂)≥H(G₂|G₁) =H(G₂|G₁) +H(Ybⁿ|G₁, G₂)

=H(Ybⁿ, G₂|G₁)≥H(Ybⁿ|G₁)

≥nR_Y_|U(D_Y).

To show the achievability, the idea is basically the same as the proof of Theorem 5.4.2, so we briefly describe it. Let U be given and δ > 0 be arbitrary. Pick random variables Xb and Yb satisfying

I(X;X|Ub )< R_X|U(D₁) +δ, Ed(X,X)b ≤D₁ I(Y;Yb|U)< RY|U(DY) +δ, Ed(Y,Yb)≤DY. Letn be given. We generate the codebook as follows:

1. Randomly generate Uⁿ(1), . . . , Uⁿ(M) , where M = 2^n(I(X,Y^;U)+δ).

2. For any i∈ {1, . . . , M}, randomly generate Xbⁿ(i,1), . . . , Xbⁿ(i, T) from the set A^∗(n)² (X|Uⁿ(i)), where T = 2^n(I(X;^X|U)+δ)^b .

3. For any i∈ {1, . . . , M}, randomly generate Ybⁿ(i,1), . . . ,Ybⁿ(i, S) from the set A^∗(n)² (Y|Uⁿ(i)), where S = 2^n(I(Y^;^Y^b^|U)+δ).

If (xⁿ, yⁿ) is the source sequence, the transmitter finds the indices s ∈ {1, . . . , S}, t∈ {1, . . . , t}, and m∈ {1, . . . , M} such that

(xⁿ, yⁿ, Uⁿ(m),Xbⁿ(m, t),Ybⁿ(s, t))∈A^∗(n)_² (X, Y, U,X,b Yb).

Then he transmits s through the top link and (m, t) through the bottom link. The encoder at the middle node then transmits m to the receiver. If (m, t) is observed

at the middle node, the decoder then naturally reproduces Xⁿ as Xbⁿ(m, t). If s is received from the top link andt is observed from the bottom at receiver, the decoder reproducesYⁿ asYb(s, t). By using this coding scheme, the same argument as in the proof of Theorem 5.4.2 implies that when n is sufficiently large, one has that with sufficiently high probability,

(xⁿ,Xbⁿ(m, t))∈A^∗(n)_² (X,X)b and

(yⁿ,Ybⁿ(s, t))∈A^∗(n)_² (Y,Yb).

Then the reproductionsXbⁿ(m, t) andYbⁿ(s, t) will match the distortion requirements (D₁, D_Y).

Chapter 6 Conclusions and Future Works

We proved that the lossless rate region for canonical source coding and the lossy rate region for both canonical and non-canonical source coding are continuous in distribution in Chapter 2. Although a counterexample shows that the lossless rate region for non-canonical source coding may not be continuous in distribution, it is s-continuous, meaning that as long as two distributions are sufficiently close and have the same support, the lossless rate regions can be arbitrarily close. We also proved that the zero-error rate region is s-continuous.

The exponentially strong converse is proved for coded-side information problem, lossless source coding for multicast network with side information at the end nodes, and lossy source coding for the point-to-point communication in Chapter 3.

Chapter 4 introduces a family of algorithms to approximate the rate regions for some example network source coding problems that guarantees (1 +²)-approximation including the lossless rate region for the coded side information problem, the Wyner- Ziv rate distortion function, and the Berger et al. bound for the lossy coded side information problem. The proposed algorithms that approximate the rate regions based on their single-letter characterizations may be improved by cleverly choosing the family of quantized conditional distributions of auxiliary random variables given the source random variables.

By applying the techniques used to solving the simple network source coding problems in the literature, we use auxiliary random variables to bound the rate regions for two basic network source coding problems that capture some important char-

acteristics of general networks: the lined two-hop and the diamond networks. The performance bounds provide a way to bound larger networks by decomposing them into basic components.

The results in this thesis may provide a method to understand some theoretical problems in the field of source coding over networks, for instance, the existence of single-letter characterizations, tightness of some early derived bounds, and the be- haviors of the achievable rate regions as functions of the source distribution and the distortion vector. Some techniques may apply to the study of the capacity regions for channel coding over networks.

Chapter 7 APPENDIX

A Lemmas for Theorem 2.4.1

The following sequence of Lemmas builds to Theorem 2.4.1, which proves that for the canonical source coding problemR(P_X,Y,D) is uniformly continuous inD. First, Lemma A.1 and Corollary A.2 bound the conditional entropy of one random vector given the other as a function of Hamming distance between them.

Lemma A.1 Let Vⁿ = (V₁, V₂, . . . , V_n) be a random vector in ϑⁿ and let w be the per symbol expected Hamming weight of Vⁿ

w= 1

nEd_H(Vⁿ,0) = 1

nE|{i | V_i 6= 0}|.

Then

H(Vⁿ)≤n(H(w) +wlog(m^s−1)).

Proof. First notice that since Vⁿ ∈Θⁿ, there are at most m^s possible values for V_i for every i∈ {1, . . . , n}. For everyi∈ {1, . . . , n}, let {ai,0, . . . , ai,m^s−1} be the set of possible values for V_i. For each i∈ {1, . . . , n}and each j ∈ {0, . . . , m^s−1}, let p_i,j denote the probability Pr(Vi =ai,j). Then

w= 1

nEd_H(Vⁿ,0) = 1 n

i=1 mX^s−1

j=1

p_i,j.

Letw_i :=P_m ₋₁

j=1 p_i,j. The maximal entropyH(V_i) over all distributions with the same values (w₁, . . . , w_n) occurs when p_i,0 = 1−w_i and p_i,j = _m^ws−1ⁱ for each j 6= 0.

Hence

H(V_i)≤H µ

1−w_i, w_i

m^s−1, . . . , w_i m^s−1

=H(w_i) +w_ilog(m^s−1).

Therefore, by the convexity of the entropy function,

H(Vⁿ) ≤ Xn

i=1

H(V_i)≤ Xn

i=1

[H(w_i) +w_ilog(m^s−1)]

≤ n(H(w) +wlog(m^s−1)).

For any Vⁿ ∈Θⁿ, the set{a∈ϑⁿ | Pr(Vⁿ =a)} is called the support set of Vⁿ. Corollary A.2 LetVⁿ andWⁿ be two random vectors in ϑⁿ with the same support set. Then

nH(Vⁿ|Wⁿ)≤H(τ) +τlog(m^s−1) =O(τlog(1/τ)), where

τ :=E

·1

nd_H(Vⁿ, Wⁿ)

¸ .

Proof. Apply Lemma A.1 to the random vector Vⁿ−Wⁿ. ¤ For any a∈IR^D₊ and any δ >0, define aδ := (aδ,κ)κ∈D where for each κ∈ D,

aδ,κ :=





δ, if a_κ ≤δ,

a_κ, otherwise. (A-1)

In Lemma A.3, we examine the relationship between R(PX,Y,D) and R(PX,Y,Dδ).

Lemma A.3 Let N be a canonical network source coding problem. For any² > 0, there exists δ > 0 such that for any D ∈ IR^D₊ and any P_X,Y ∈ M, R(P_X,Y,D) and R(P_X,Y,D_δ) are ²-close.

Proof. For any a∈IR^D₊ and any δ >0, let [a]_δ= ([a]_δ,κ)_κ∈D be defined as

[a]_δ,κ =





0, if aκ ≤δ, a_κ, otherwise.

Let² >0 be given. The majority of the proof works to show that there exists a δ >0 such that for any D = (D_κ)_κ∈D ∈IR^D₊ and any P_X,Y ∈ M, R(P_X,Y,D) and R(P_X,Y,[D]_δ) are ²/2-close. From this we can observe that both R(P_X,Y,D) and R(P_X,Y,D_δ) are ²/2-close to R(P_X,Y,[D]_δ), and hence R(P_X,Y,D) and

R(P_X,Y,D_δ) are ²-close as desired.

To show thatR(P_X,Y,D) and R(P_X,Y,[D]_δ) are²/2-close, note first that [D]_δ ≤D for any δ >0, so R(P_X,Y,[D]_δ)⊆ R(P_X,Y,D). Therefore, we need only to choose an appropriate δ >0 (independent of (X,Y) and D) such that

R(P_X,Y,D) + (²/2)·1⊆ R(P_X,Y,[D]_δ).

For any achievable rate vector R∈ R(P_X,Y,D), pick a rate-R, length-n block code C such that for each κ= (v, θ)∈ D,

Ed(θⁿ(Xⁿ),θbⁿ(v))≤D_κ+τ d_min,

where θbⁿ(v) is the reproduction ofθⁿ(Xⁿ) at v and τ >0 is a constant satisfying sk(H(2τ) + 2τlogm)< ²

Our goal is to use C to construct another code for which the error probability of the reproduction for each κ∈ D with D_κ ≤δ can be made arbitrarily small. By

Corollary A.2, for eachκ = (v, θ)∈ D such that _n¹Ed(θⁿ(Xⁿ),θbⁿ(v))≤2τ d_min, we know that

H(θⁿ(Xⁿ)|θbⁿ(v))≤n²/(2sk).

Since reproduction θbⁿ(v) is known at node v, additional rate vector

³s

nH(θⁿ(Xⁿ)|θbⁿ(v)) +²/(2k)

·1

suffices to describe θⁿ(Xⁿ) losslessly at nodev by Lemma 2.2.18. We therefore modify code C by adding additional |I(θ)| descriptions for reproducing θⁿ(Xⁿ) losslessly given θbⁿ at nodev as described in Lemma 2.2.18. For every (v, θ)∈ D and every i∈I(θ), the modified code sends the corresponding additional description along a path connecting v and some source node that has access to source Xi. The total additional rate vector (for all such pairs of sources and demands) on each link is less than (²/2)·1. Therefore, the rate vector R+ (²/2)·1 is [D]δ-achievable. By

letting δ=τ d_min, we get the desired result. ¤

Lemma A.4 shows that there exists a rate vector that is D-achievable for any distribution PX,Y and any distortion vector D∈IR^D₊.

Lemma A.4

PX,Y∈M

R(P_X,Y,0)6=∅.

Proof. Since each component of any vector (X,Y) of source and side information symbols has alphabet size no greater than m, the rate vector k log(m)1∈IR^E₊

achieves 0 distortion for any P_X,Y ∈ M. ¤

Lemma A.5 combines the results of Lemmas A.1 - A.4 and is applied in the proof of Theorem 2.4.1.

Lemma A.5 Let N be a canonical network source coding problem. For any² > 0, there exists a δ >0 such that for any P_X,Y ∈ Mand any D ∈ IR^D₊, R(P_X,Y,D) and R(P_X,Y,D+δ·1) are ²-close.

Proof. Given any ² >0. We prove the lemma by first showing that there exists a τ >0 such that R(P_X,Y,D) and R(P_X,Y,D_τ) are ²/2-close for all D∈IR^D₊, and

then showing that there exists a 0< δ < τ such that R(P_X,Y,D_δ+δ·1) + ²

2·1⊆ R(P_X,Y,D_τ).

Together, these results imply that R(PX,Y,D) and R(PX,Y,Dδ+δ·1) are ²-close since

R(PX,Y,D)⊆ R(PX,Y+Dδ+δ·1)

R(PX,Y,Dδ+δ·1) +²·1⊆ R(PX,Y,Dτ) + ²

2 ·1⊆ R(PX,Y,D).

This proves the desired result since

R(P_X,Y,D)⊆ R(P_X,Y,D+δ·1)⊆ R(P_X,Y,D_δ+δ·1).

The first result follows immediately from Lemma A.3. Precisely,

(i) there exists a τ >0 such that R(P_X,Y,D) and R(P_X,Y,D_τ) are ²/2-close for allD ∈IR^D₊.

(ii) We next show that there exists a 0< δ < τ such that for any D∈IR^D₊ with Dκ≥τ for all κ∈ D,

R(P_X,Y,D+δ·1) + ²

2 ·1⊆ R(P_X,Y,D).

To prove this, first fix R0 ∈T

PX,Y∈MR(PX,Y,0); this is possible by

Lemma A.4. For any 0< δ < τ, D∈IR^D₊ and D_κ≥τ for all κ∈ D together imply

(1− δ

τ)(D+δ·1) + δ

τ(τ ·1)≤D

since for any κ∈ D,

(1− δ

τ)(D_κ+δ) + δ

τδ−D_κ

= δ− δ

τD_κ =δ(1− D_κ τ )≤0.

By the convexity and the monotonicity ofR(PX,Y,D) on D, we have (1− δ

τ)R(P_X,Y,D+δ·1) + δ

τR(P_X,Y, δ·1)

⊆ R(P_X,Y,(1− δ

τ)(D+δ·1) + δ

τ(δ·1))

⊆ R(P_X,Y,D),

which implies for all D∈IR^D₊ with D_κ ≥τ for all κ∈ D, R(PX,Y,D+δ·1) + δ

τR0 ⊆ R(PX,Y,D).

This together with the definition of D_τ (A-1) implies that for all D∈IR^D₊, R(P_X,Y,D_τ +δ·1) + δ

τR₀ ⊆ R(P_X,Y,D_τ).

By the monotonicity of R(P_X,Y,D) in D, the following satisfies for all D^D₊ and for any 0< δ < τ

R(PX,Y,Dδ+δ·1) + δ

τR0 ⊆ R(PX,Y,Dτ).

Thus the desired result holds for all 0< δ < τ that satisfy δ

τR₀ ≤ ² 2 ·1.

B Continuity of R

^∗

(P

_X,Y

, D) with respect to D

Lemma B.1 shows that the set R^∗(PX,Y,D) is continuous at D when N is non- canonical and D > 0 or N is canonical and D ≥ 0. The proof is similar to Theo- rems 2.4.1 and 2.4.2, hence we state the lemma without a proof.

Lemma B.1 LetN be a network source coding problem. IfN is non-canonical and D>0 orN is canonical and D≥0, then R^∗(PX,Y,D) is continuous at D.

C Lemmas for Section 2.5.2

In Lemma C.1, we show that when two distributions P_X,Y and Q_X,Y are sufficiently close and the support ofPX,Y is a subset of that ofQX,Y, there exists a joint distribution T_X,X⁰_,Y,Y⁰ with marginal on (X⁰,Y⁰) equal to Q_X,Y and conditional distribution of (X⁰,Y⁰) given (X,Y) = (X⁰,Y⁰) equal to PX,Y for which (X,Y) equals (X⁰,Y⁰) with high probability.

Lemma C.1 For any ² > 0 and any two distributions P_X,Y ∈ M and Q_X,Y ∈ M satisfying that

(1−²)P_X,Y(x,y)≤Q_X,Y(x,y) ∀ (x,y)∈ A,

there exists a joint distribution T_X,Y,X⁰_,Y⁰ of (X,Y) with alphabet A and (X⁰,Y⁰) with alphabet A ∪ {a0}, where a0 is an extra symbol not in A, such that

(a) Q_X,Y =T_X,Y, the marginal of T_X,Y,X⁰_,Y⁰ on(X,Y).

(b)

P_X,Y(x,y) =TX,Y|{(X,Y)=(X⁰,Y⁰)}(x,y) ∀ (x,y)∈ A,

whereTX,Y|{(X,Y)=(X⁰,Y⁰)}is the conditional distribution of T_X,Y,X⁰_,Y⁰ on (X,Y) given the event

{(X,Y) = (X⁰,Y⁰)}.

(c)

Pr ((X,Y) = (X⁰,Y⁰)) = 1−².

Proof. For any (x,y)∈ A, define

T_X⁰_,Y⁰(x,y) := (1−²)P_X,Y(x,y) T_X,Y|X⁰_,Y⁰(x,y|x,y) := 1

T_X⁰_,Y⁰(a₀) :=²

T_X,Y|X⁰_,Y⁰(x,y|a₀) := Q_X,Y(x,y)−(1−²)P_X,Y(x,y)

² .

All other values of TX,Y,X⁰,Y⁰ are zero. Then this distribution TX,Y,X⁰,Y⁰ has marginal on (X,Y) equal to Q_X,Y. Also,

Pr ((X,Y) = (X⁰,Y⁰)) = X

(x,y)∈A

TX,Y,X⁰,Y⁰(x,y,x,y) = 1−².

The conditional distribution of (X,Y) = (x,y) given (X,Y) = (X⁰,Y⁰) for all (x,y)∈ A is

TX,Y|{(X,Y)=(X⁰,Y⁰)}(x,y) = T_X,Y,X⁰_,Y⁰(x,y)

Pr ((X,Y) = (X⁰,Y⁰)) =P_X,Y(x,y).

Lemma C.2 Suppose P_X,Y, Q_X,Y, and T_X,Y,X⁰_,Y⁰ are as described in Lemma C.1.

Then

1−2²R(Nb, T_X,Y,Y⁰_,X⁰,D)⊆ R(N, P_X,Y, 1 1−2²D).

Proof. The main idea of the proof is as follows. Since the probability of the event (X,Y) = (X⁰,Y⁰) is 1−², when block length n is sufficiently high, the number of occurrences that (X_i,Y_i) = (X⁰_i,Y_i⁰) in length-n sequence (Xⁿ,Yⁿ) is higher than n(1−2²) for sufficiently high probability by the weak law of large numbers. We apply this property to show that any rate-R, length-n block code C_n for Nb with

source vectorX and side-information vector (Y,Y⁰,X⁰) which achieves D can be applied to construct a length-n(1−2²) block code for N with source and side information vectors X^n(1−2²) and Y^n(1−2²) that has rate no greater than _1−2²¹ R and satisfies the distortion constraint given by _1−2²¹ D.

Letτ > 0 andR ∈ R(Nb, T_X,Y,Y⁰_,X⁰,D). Let n be sufficiently large such that there exists a rate-R, length-n block code C_n which satisfies

Pr (T_n)>1−τ, (A-2)

where T_n is the event Tn :=

½1 nd

θⁿ(Xⁿ),θbⁿ(v)

< D(v,θ)+τ ∀ (v, θ)∈ D

and θbⁿ(v) is the reproduction of θⁿ(Xⁿ) at node v using Cn for all (v, θ)∈IR^D₊. (A-2) can be achieved by applying the weak law of large numbers on a long code constructed by repeatedly using a code which achieves D_(v,θ)+τ /2. For any I ⊆ {1, . . . , n}, define

B_²⁽ⁿ⁾(I) :={(xⁿ,x⁰ⁿ,yⁿ,y⁰ⁿ) |(xi,yi)6= (x⁰_i,y⁰_i) if and only if i∈I}.

Let

L(I) :={(a^|I|,b^|I|)∈ A^|I|× A^|I| | ai 6=bi ∀i∈I}.

For any sequence (a^|I|,b^|I|)∈ L(I), let B_²⁽ⁿ⁾(I,(a^|I|,b^|I|)) := ©

(xⁿ,x⁰ⁿ,yⁿ,y⁰ⁿ) |((x_I,y_I),(x⁰_I,y⁰_I)) = (a^|I|,b^|I|)ª

∩B_²⁽ⁿ⁾(I) denote the set of sequences (xⁿ,x⁰ⁿ,yⁿ,y⁰ⁿ)∈B²⁽ⁿ⁾(I) such that

((x_i,y_i),(x⁰_i,y⁰_i)) = (a_i,b_i) for all i∈I. Let B_²⁽ⁿ⁾:= [

I⊆{1,...,n} |I|<2n²

B_²⁽ⁿ⁾(I)

denote the set of sequences (xⁿ,x⁰ⁿ,yⁿ,y⁰ⁿ) for which there are at most 2n²indices i such that (x_i,y_i)6= (x⁰_i,y⁰_i). Since Pr ((X,Y)6= (X⁰,Y⁰)) = ²,

Pr¡ B_²⁽ⁿ⁾¢

>1−τ

when n is sufficiently large by the weak law of large numbers. By the union bound, Pr¡

B⁽ⁿ⁾_² ∩ Tn

¢>1−2τ. (A-3)

Rewrite inequality (A-3) as X

I⊆{1,...,n} |I|<2n²

(a^|I|,b^|I|)∈L(I)

Pr¡

B⁽ⁿ⁾_² (I,(a^|I|,b^|I|))¢ Pr¡

T_n|B_²⁽ⁿ⁾(I,(a^|I|,b^|I|))¢

= Pr¡

B_²⁽ⁿ⁾∩ Tn

¢>1−2τ

Therefore, there exist Ib⊆ {1, . . . , n} and

(a^|₀^I|^b,b^|₀^I|^b) = ((x^|I|₀ ,y^|I|₀ ),(x⁰₀^|I|,y₀⁰^|I|))∈ L(I) such that |I|b <2n² and Pr

T_n|B_²⁽ⁿ⁾(I,b (a^|₀^I|^b,b^|₀^I|^b))

>1−2τ . (A-4)

Without loss of generality, suppose that

I ={n− |I|+ 1, . . . , n}.

For any (x^n−|I|,y^n−|I|)∈ A^n−|I|, Pr

(X^n−|I|,Y^n−|I|) = (x^n−|I|,y^n−|I|)| (Xⁿ,Yⁿ)∈B_²⁽ⁿ⁾(I,b (a^|₀^I|^b,b^|₀^I|^b))

=T_Xn−|I|,Y^n−|I||{(X^n−|I|,X^0n−|I|)=(Y^n−|I|,Y^0n−|I|)}(xⁿ,yⁿ)

=P_X^n−|I|_,vecY^n−|I|(x^n−|I|,y^n−|I|) (A-5)

LetCb_n−|I| be the length-(n− |I|) block code for input sequence (X^n−|I|,Y^n−|I|) by

applying code C_n to

((X^n−|I|,x^|I|₀ ),(Y^n−|I|,y₀^|I|),(X^n−|I|,x⁰₀^|I|),(Y^n−|I|,y⁰₀^|I|)).

By (A-4), the codeCb_n−|I| has expected average distortions 1

n− |I|Ed(θ^n−|I|,θb^n−|I|(v))≤(1−2τ) µ n

n− |I|D_(v,θ)+ n n− |I|τ

+ 2τ d_max

for all (v, θ)∈IR^D₊. Therefore, since |I|<2n², Cb_n−|I| has rate n

n− |I|R≤ 1 1−2²R and expected average distortion vector no greater than

(1−2τ) µ 1

1−2²D+ 1 1−2²τ

+ 2τ d_max

By (A-5), the expected distortion is evaluated according to PX,Y. Hence by letting τ →0

1−2²R∈ R(N, PX,Y, 1 1−2²D).

This completes the proof. ¤

Lemma C.3 Suppose PX,Y, QX,Y, and TX,Y,X⁰,Y⁰ are as described in Lemma C.1.

Then

1−2²RL(Nb, TX,Y,Y⁰,X⁰)⊆ RL(N, PX,Y).

Proof. The proof is similar to that of Lemma C.2. Let τ > 0 and

R∈ R(Nb, T_X,Y,Y⁰_,X⁰,D). Let n be sufficiently large such that there exists a rate-R, length-n block code Cn which satisfies

θⁿ(Xⁿ) =θbⁿ(v) ∀(v, θ)∈ D

≥1−τ,

whereθbⁿ(v) is the reproduction ofθⁿ(Xⁿ) at nodev usingC_n for all (v, θ)∈IR^D₊. Let E_n:=

θⁿ(Xⁿ)6=θbⁿ(v) for some (v, θ)∈ D o

denote the event of decoding error using code Cn. Let L(I) (for I ⊆ {1, . . . , n}) and B²⁽ⁿ⁾(I,(a^|I|,b^|I|)) (for (a^|I|,b^|I|)∈ L(I)) be the sets defined in the proof of

Theorem C.2. The same argument in Theorem C.2 leads to Pr

E_n^c|B_²⁽ⁿ⁾(I,b(a^|₀^I|^b,b^|₀^I|^b))

>1−2τ (A-6)

for some Ib⊆ {1, . . . , n} and (a^|₀^I|^b,b^|₀^I|^b) = ((x^|I|₀ ,y^|I|₀ ),(x⁰₀^|I|,y⁰₀^|I|))∈ L(I) such that

|I|b <2n². Without loss of generality, suppose that I ={n− |I|+ 1, . . . , n}.

LetCb_n−|I| be the length-(n− |I|) block code for input sequence (X^n−|I|,Y^n−|I|) by applying code C_n to

((X^n−|I|,x^|I|₀ ),(Y^n−|I|,y₀^|I|),(X^n−|I|,x⁰₀^|I|),(Y^n−|I|,y⁰₀^|I|)).

By (A-6), the codeCb_n−|I| has decoding error probability no greater than 2τ and has rate no greater than

1 1−2²R.

By (A-5), the error probability is evaluated according to P_X,Y. Hence by letting τ →0

1−2²R∈ R_L(N, P_X,Y).

This completes the proof. ¤

Lemma C.4 Suppose P_X,Y, Q_X,Y, and T_X,Y,X⁰_,Y⁰ are as described in Lemma C.1.

Then

1−2²R_Z(Nb, T_X,Y,Y⁰_,X⁰)⊆ R_Z(N, P_X,Y).

Proof. The proof is similar to that of Lemma C.2. Let τ > 0 and

R∈ R(Nb, T_X,Y,Y⁰_,X⁰,D). Let n be sufficiently large such that there exists a

dimension-n zero-error variable-length codeC_n with length vectorL⁽ⁿ⁾ which satisfies Pr (Kn)>1−τ,

where

Kn:={1

nL⁽ⁿ⁾(Xⁿ,Yⁿ)≤R+τ ·1} (A-7) is the event that the average length vector is no greater than R+τ·1. (A-7) can be achieved by applying the weak law of large numbers on a long code constructed by repeatedly using the codewords of a zero-error variable-length code whose expected average length vector is no greater than R+ (τ /2)·1. Let L(I) (for I ⊆ {1, . . . , n}) and B²⁽ⁿ⁾(I,(a^|I|,b^|I|)) (for (a^|I|,b^|I|)∈ L(I)) be the sets defined in the proof of Theorem C.2. The same argument in Theorem C.2 leads to

Kn|B_²⁽ⁿ⁾(I,b(a^|₀^I|^b,b^|₀^I|^b))

>1−2τ (A-8)

for some Ib⊆ {1, . . . , n} and (a^|₀^I|^b,b^|₀^I|^b) = ((x^|I|₀ ,y^|I|₀ ),(x⁰₀^|I|,y⁰₀^|I|))∈ L(I) such that

|I|b <2n². Without loss of generality, suppose that I ={n− |I|+ 1, . . . , n}.

LetCb_n−|I| be the length-(n− |I|) block code for input sequence (X^n−|I|,Y^n−|I|) by applying code C_n to

((X^n−|I|,x^|I|₀ ),(Y^n−|I|,y₀^|I|),(X^n−|I|,x⁰₀^|I|),(Y^n−|I|,y⁰₀^|I|)).

By (A-8), the zero-error variable-length code Cb_n−|I| has expected average length vector no greater than

1−2|I|((1−2τ)(R+τ·1) + (2τ(s+ 2t) logm)·1)

≥ 1

1−2²((1−2τ)(R+τ·1) + (2τ(s+ 2t) logm)·1),

where the value (s+ 2t) logm uppers all possible average lengths over each edge e∈ E since for each e∈ E, there are at most M^s+2t codewords using C_n. By (A-5), the expected length is evaluated according to PX,Y. Hence by letting τ →0

1−2²R∈ R_Z(N, P_X,Y).

This completes the proof. ¤

D Proof of Theorem 4.4.1

LetJ₂^∗(λ) = min_{P_Z_(z)}_z∈Z J₂(λ) be the optimal value of J₂(λ) for the Wyner-Ziv rate region, and letJb2(λ) be the value computed by the algorithm proposed in Section 4.4.

ThenJb₂(λ)≥J₂^∗(λ) since the algorithm finds an auxiliary random variable Z achiev- ing the given Lagrangian. We next find (η, δ) to guarantee that Jb2(λ)≤(1 +²)J₂^∗(λ).

Recall thatJ₂(λ) := I(X;Z|Y)+λmin_ψEd(X, ψ(Y, Z)). Before boundingJb₂(λ)−

J₂^∗(λ), rewrite J₂(λ) as

J₂(λ) = H(X|Y)−H(Y|X) +H(Y|Z)−H(X|Z) +λmin

ψ Ed(X, ψ(Y, Z))

= H(X|Y)−H(Y|X) +X

z∈Z

P_Z(z)H(Y|Z =z)

−X

z∈Z

P_Z(z)

H(X|Z =z)

p(y|x)Q_X|Z(x|z)d(x, ψ^∗(y, z))

# ,

whereψ^∗(y, z) is the optimizing reproduction ofXgiven conditional{QX|Z(x|z)}(x,z)∈X ×Z

and (y, z). Fix {Q_X_|Z(x|z)}_{(x,z)∈X ×Z} and let {Qb_X_|Z(x|z)}_{(x,z)∈X ×Z} be the quantized conditional. Let

Q_Y_|Z(y|z) = X

p(y|x)Q_X_|Z(x|z) Qb_Y_|Z(y|z) = X

p(y|x)Qb_X_|Z(x|z)

be the corresponding conditionals on Y given Z. Then

(1−η)Q_X|Z(x|z)−δ≤ Qb_X|Z(x|z) ≤(1 +η)Q_X|Z(x|z) (1−η)Q_Y_|Z(y|z)−δ≤ Qb_Y_|Z(y|z) ≤(1 +η)Q_Y_|Z(y|z)

for all x, y, z. Finally, let {P_Z^∗(z)}_z∈Z be the marginal on auxiliary random variable Z that achieves J₂^∗(λ) and define τ := ηlog _1−η^e . By Lemma 4.2.1, when (max{|X |,|Y|})δlog ¹_δ < τ

Let Z⁰ :=Z ∪ {z_x}_x∈X and set

Qb_X_|Z(t|z_x) =





1, if t =x 0, otherwise

Pb_Z(z) =





(1−η)P_Z^∗(z), if z ∈ Z

P_X(x)−P

z∈ZQb_X_|Z(x|z)Pb_Z(z), if z =z_x. ψ(y, z) =b





ψ^∗(y, z) ifz ∈ Z x if z =z_x Since Jb₂(λ) is optimized over all quantized distributions,

Jb₂(λ)≤J₂(λ)|_Q_b

X|Z,PbZ,ψb. Thus

Jb₂(λ)−J₂^∗(λ)

≤ [(2η−η²)H(X|Z)−η²H(Y|Z)]_P_Z^∗_,Q_X|Z

+(2η−η²)H(Y|X) + 4(1−η)τ+|Y||Z|δ(1−η)

≤ η((2−η)(|X |+|Y|) + 8 +|Y||Z|))

≤ η(2|X |+ 2|Y|+ 8 +|Y||Z|).

when δ < η <1−₄^e and (max{|X |,|Y|})δlog¹_δ < τ. Define L^∗(λ) := min

0≤D≤Dmax

¡R_X_|Y(D) +λD¢

where R_X|Y(D) is the conditional rate-distortion function for X given Y. Then J₂^∗(λ)≥L^∗(λ) implies

Jb₂(λ)−J₂^∗(λ)≤ η(2|X |+ 2|Y|+ 8 +|Y||Z|) L^∗(λ) J₂^∗(λ).

We therefore wish to choose δ and η to satisfy δ < η = L^∗(λ)

2|X |+ 2|Y|+ 8 +|Y||Z|² <1− e 4.

Define f(x) := −xlog(x) for x ∈ [0,1/e). Function f is strictly increasing and therefore invertible. Setting

η = min

½ L^∗(λ)

2|X |+ 2|Y|+ 8 +|Y||Z|²,1− e 4

(A-9) δ = min

½ η, f⁻¹

µ min

½ η

|X |, η

|Y|

¾¶¾

(A-10) yields (max{|X |,|Y|})δlog¹_δ < τ as desired and guarantees a (1 +²)-approximation.

The interior-point solver fork variables runs in timeO(k⁴) [54]. Sincek =|Z| for our linear program, our algorithm runs in timeO(N(δ, η,|X |)⁴). Applying the given choice of δ and η, our algorithm runs in timeO(²^{−4(|X |+1)}) as ² approaches 0.

E Proof of Theorem 4.5.1

Let J₃^∗(λ₁, λ₂, λ₃) = minJ₃(λ₁, λ₂, λ₃) be the optimal value of J₃(λ₁, λ₂, λ₃) for the lossy coded side information region, and let Jb3(λ1, λ2, λ3) be the value computed by the algorithm proposed in Section 4.5. Then Jb₃(λ₁, λ₂, λ₃)≥J₃^∗(λ₁, λ₂, λ₃). We next find (η, δ) such that Jb3(λ1, λ2, λ3)≤(1 +²)J₃^∗(λ1, λ2, λ3).

Let Z₁⁰ =Z₁∪ {z_x₁}_x₁_∈X₁ and set

Qb_X₁_|Z₁(t|z_x₁) =





1, if t=x₁ 0, otherwise .

LetP_Z^∗₁Q^∗_Z₂_|X₂ be a distribution on (Z₁, X₂) that achieves J₃^∗(λ₁, λ₂, λ₃). Define

PbZ1(z) =





(1−η⁰)P_Z^∗₁(z) ∀ z∈ Z, P_X₁(x₁)−P

zQb_X₁_|Z(x₁|z)Pb_Z₁(z) ∀ x₁ ∈ X₁.

Letτ⁰ :=η⁰log _1−η^e 0 andδ⁰ := (|X₁|+1)δη⁰ = 3η. Chooseδ >0 such that|X₁||Z₂|δ⁰log_δ¹0 ≤ τ⁰. By Lemma 4.2.1, for all x₁ ∈ X₁,x₂ ∈ X₂, and z₁ ∈ Z₁

|H(Qb_X₁_|Z₁_=z₁)−H(Q_X₁_|Z₁_=z₁)|

≤ η⁰H(Q_X₁_|Z₁_=z₁) + 2τ⁰

|H(Qb_X₁_,Z₂_|Z₁_=z₁)−H(Q_X₁_,Z₂_|Z₁_=z₁)|

≤ η⁰H(Q_X₁_,Z₂_|Z₁_=z₁) + 2τ⁰

|H(Qb_Z₂_|Z₁_=z₁)−H(Q_Z₂_|Z₁_=z₁)|

≤ η⁰H(Q_Z₂_|Z₁_=z₁) + 2τ⁰

|H(Qb_Z₂_|X₂_=x₂)−H(Q_Z₂_|X₂_=x₂)|

≤ η⁰H(Q_Z₂_|X₂_=x₂) + 2τ⁰

|H(QbZ2)−H(QZ2)| ≤η⁰H(QZ2) + 2τ⁰ ,

I(X₁;Z₁|Z₂)

= H(X₁|Z₁)−H(X₁|Z₁, Z₂)

= H(X₁|Z₁)−H(X₁, Z₂|Z₁)−H(Z₂|Z₁) I(X₂;Z₂) = H(Z₂)−H(Z₂|X₂),

and hence we have

H(Xb ₁|Z₁) ≤ (1−η⁰²)H^∗(X₁|Z₁) + 2(1−η⁰)τ⁰ H(Xb ₁, Z₂|Z₁) ≥ (1−η⁰)²H^∗(X₁, Z₂|Z₁)−2(1−η⁰)τ⁰

H(Zb ₂|Z₁) ≥ (1−η⁰)²H^∗(Z₂|Z₁)−2(1−η⁰)τ⁰ H(Zb ₂|X₂) ≥ (1−η⁰)²H^∗(Z₂|X₂)−2(1−η⁰)τ⁰

Ed(X₁, ψ^∗(Z₁, Z₂))_P_b_Z

1,Qb^∗_Z

2|X2

= (1−η⁰)Ed(X₁, ψ^∗(Z₁, Z₂))_P_Z^∗

1,Q^∗_Z

2|X2. Since Jb3(λ1, λ2, λ3)≤Jb3(λ1, λ2, λ3)|PbZ1,{QbX1|Z1},Qb^∗_Z

2|X2

, by taking η⁰ <1−^e₄, we have Jb₃(λ₁, λ₂, λ₃)−J₃^∗(λ₁, λ₂, λ₃)

≤ λ₁(2η⁰(|X₁||Z₁|) +|Z₂|)) +λ₂(η⁰|Z₂|+ 2η⁰|Z₂|) +(6λ₁+ 4λ₂)(1−η⁰)τ⁰

≤ η⁰(2λ₁(|X₁||Z₁|+|Z₁|) + 3λ₂|Z₂|+ (12λ₁+ 8λ₂)). Define

L(λ₁, λ₂, λ₃) := min

D [min{λ₁, λ₂}R_X₁(D) +λ₃D]. Set

η = min

½4−e

12 ,²L(λ₁, λ₂, λ₃) T(λ1, λ2, λ3)

(A-11)

δ = 1

|X1|+ 1f⁻¹

µ η 3|X1||Z2|

, (A-12)

where

T(λ₁, λ₂, λ₃)

:= 6λ₁(|X₁||Z₁|+|Z₁|) + 9λ₂|Z₂|+ (36λ₁+ 24λ₂).

Then the pair (η, δ) satisfies the following inequalities δ⁰ = (|X₁|+ 1)δ

η⁰ = 3η <1− e 4

|X₁||Z₂|δ⁰log 1 δ⁰ ≤τ⁰

η⁰ < ²L(λ₁, λ₂, λ₃)

2λ₁(|X₁||Z₁|+|Z₁|) + 3λ₂|Z₂|+ (12λ₁+ 8λ₂), which gives

J₃^∗(λ₁, λ₂, λ₃)≤Jb₃(λ₁, λ₂, λ₃)≤(1 +²)J₃^∗(λ₁, λ₂, λ₃).

In this algorithm, there are N(δ, η,|X₁|) variables in each of the linear programs in the inner loop, and there are N(δ, η,|X2|)^|X²^| quantized conditional probabilities Qb_Z₂_|X₂ in the outer loop. Applying the given choice ofδ and η, our algorithm runs in time O(²^−4(|X¹^|+|X²²^|+1)) as ² approaches 0.

Bibliography

[1] C. E. Shannon. Coding theorems for a discrete source with a fidelity criterion.

In IRE National Convention Record, volume 4, pages 142–163, 1959.

[2] D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources.

IEEE Transactions on Information Theory, IT-19:471–480, 1973.

[3] R. M. Gray and A. D. Wyner. Source coding for a simple network. Bell System Technical Journal, 53(9):1681–1721, November 1974.

[4] R. Ahlswede and J. K¨orner. Source coding with side information and a converse for degraded broadcast channels. IEEE Transactions on Information Theory, IT-21(6):629–637, November 1975.

[5] A. D. Wyner and J. Ziv. The rate-distortion function for source coding with side information at the decoder. IEEE Transactions on Information Theory, IT-22(1):1–10, January 1976.

[6] T. Berger, K. B. Housewright, J. K. Omura, S. Tung, and J. Wolfowitz. An upper bound on the rate distortion function for source coding with partial side information at the decoder. IEEE Transactions on Information Theory, IT- 25(6):664–666, November 1979.

[7] T. Berger and R. Yeung. Multiterminal source encoding with one distortion criterion. IEEE Transactions on Information Theory, IT-35(2):228–236, March 1989.

[8] C. Heegard and T. Berger. Rate distortion when side information may be absent.

IEEE Transactions on Information Theory, IT-31:727–734, November 1985.

[9] H. Yamamoto. Source coding theory for cascade and branching communication systems. IEEE Transactions on Information Theory, IT-27:299–308, May 1981.

[10] R. Ahlswede, N. Cai, S. Y. R. Li, and R. Yeung. Network information flow.

IEEE Transactions on Information Theory, IT-46(4):1204–1216, July 2000.

[11] T. Ho, M. M´edard, R. Koetter, D. R. Karger, M. Effros, J. Shi, and B. Leong.

A random linear network coding approach to multicast. IEEE Transactions on Information Theory, IT-52(10):4413–4430, October 2006.

[12] M. Bakshi and M. Effros. On achievable rates for multicast in the presence of side information. In Proceedings of the IEEE International Symposium on Information Theory, Toronto, Canada, July 2008.

[13] H. Yamamoto. Wyner-Ziv theory for a general function of the correlated sources.

IEEE Transactions on Information Theory, IT-28(5):803–807, September 1982.

[14] A. Orlitsky and J. R. Roche. Coding for computing. IEEE Transactions on Information Theory, IT-47(3):903–917, March 2001.

[15] H. Feng, M. Effros, and S. Savari. Functional source coding for networks with receiver side information. Proceedings of the Allerton Conference on Communi- cation, Control, and Computing, September 2004.

[16] W. Gu and M. Effros. On the concavity of lossless rate regions. InProceedings of the IEEE International Symposium on Information Theory, Seattle, WA, June 2006.

[17] W. Gu and M. Effros. On the continuity of achievable rate regions for source coding over networks. InProceedings of the IEEE Information Theory Workshop, Tahoe, CA, September 2007.

[18] W. Gu, M. Effros, and M. Bakshi. A continuity theory for lossless source coding over networks. Proceedings of the Allerton Conference on Communication, Control, and Computing, September 2008.

[19] I. Csisz´ar and J. K¨oner. Information Theory: Coding Theorems for Discrete Memoryless Systems. London, U.K.:Academic, 1981.

[20] R. Ahlswede, P. G´acs, and J. K¨orner. Bounds on conditional probabilities with applications in multi-user communication.Probability Theory and Related Fields, 34(2):157–177, June 1976.

[21] Y. Oohama and T. Han. Universal coding for the Slepian-wolf data compression system and the strong converse theorem. IEEE Transactions on Information Theory, IT-40(6):1908–1919, November 1994.

[22] S. Arimoto. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Transactions on Information Theory, IT-18:14–20, Jan- uary 1972.

[23] R.E. Blahut. Computation of channel capacity and rate-distortion functions.

IEEE Transactions on Information Theory, 18(4):460–473, July 1972.

[24] F. Dupuis, W. Yu, and F. M. .J. Willems. Blahut-Arimoto algorithms for computing channel capacity and rate-distortion with side information. InProceedings of the IEEE International Symposium on Information Theory, Chicago, IL, June 2004.

[25] W. Gu, R. Koetter, M. Effros, and T. Ho. On source coding with coded side information for a binary source with binary side information. In Proceedings of the IEEE International Symposium on Information Theory, Nice, France, June 2007.

[26] W. Gu and M. Effros. Approximation on computing rate regions for coded side

Dalam dokumen On Achievable Rate Regions for Source Coding over Networks (Halaman 124-154)