• Tidak ada hasil yang ditemukan

Some Inequalities for the Relative Entropy and Applications

N/A
N/A
Protected

Academic year: 2025

Membagikan "Some Inequalities for the Relative Entropy and Applications"

Copied!
7
0
0

Teks penuh

(1)

Some Inequalities for the Relative Entropy and Applications

This is the Published version of the following publication

Dragomir, Sever S (1998) Some Inequalities for the Relative Entropy and Applications. RGMIA research report collection, 2 (4).

The publisher’s official version can be found at

Note that access to this version may require subscription.

Downloaded from VU Research Repository https://vuir.vu.edu.au/17228/

(2)

APPLICATIONS

S.S. DRAGOMIR

Abstract. Some new inequalities for the relative entropy and applications are given.

1. Introduction

Therelative entropy is a measure of the distance between two distributions. In statistics, it arises as an expected logarithm of the likelihood ratio. The relative entropy D(pkq) is a measure of the inefficiency of assuming that the distribution isqwhen the true distribution is p. For example, if we knew the true distribution of the random variable, then we could construct a code with average description length H(p). If, instead, we used the code for a distribution q, we would need H(p) +D(pkq) bits on the average to describe the random variable [1, p. 18].

In what follows, unless we will specify, log will denote logb where b is a given base greater than 1.

Definition 1. The relative entropy or Kullback-Leibler distance between two prob- ability mass functionsp(x)andq(x)is defined by

D(pkq) : =X

x∈X

p(x) log p(x)

q(x) (1.1)

= Eplog p(X)

q(X)

.

In the above definition, we use the convention (based on continuity arguments) that 0 log

0 q

= 0 andplog p0

=.

It is well-known that relative entropy is always non-negative and is zero if and only if p=q. However, it is not a true distance between distributions since it is not symmetric and does not satisfy the triangle inequality.

The following theorem is of fundamental importance [1, p. 26].

Theorem 1. (Information Inequality) Letp(x), q(x)∈ X, be two probability mass functions. Then

D(pkq)0 (1.2)

with equality if and only if

p(x) =q(x) for all x∈ X. (1.3)

Date: January , 1998.

1991Mathematics Subject Classification. Primary 26D15; Secondary 94Xxx.

Key words and phrases. Relative Entropy, Shannon Entropy, Mutual Information.

1

(3)

2 S.S. DRAGOMIR

Actually, the inequality (1.2) can be improved as follows (see , [1, p. 300]):

Theorem 2. Let p, qbe as above. Then

D(pkq) 1

2 lnbkp−qk21

(1.4)

wherekp−qk1= P

x∈X

|p(x)−q(x)|is the usual1-norm of p−q. The equality holds iffp=q.

We remark that the argument of (1.4) is not based on the convexity of the map

log(.).

To evaluate the relative entropyD(pkq) it would be interesting to establish some upper bounds.

Before we do this, let us recall some other important concepts in Information Theory.

We introducemutual information, which is a measure of the amount of informa- tion that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other [1, p. 18].

Definition 2. Consider two random variables X and Y with a joint probability mass function p(x, y) and marginal probability mass function p(x) andq(y). The mutual information is the relative entropy between the joint distribution and the product distribution, i.e.

I(X;Y) = X

x∈X

X

y∈Y

p(x, y) log

p(x, y) p(x)q(y)

(1.5)

= D(p(x, y)kp(x)q(y))

= Ep(x,y)log

p(X, Y) p(X)q(Y)

.

The following corollary of Theorem 1 holds [1, p. 27].

Corollary 1. (Non-negativity of mutual information): For any two random vari- ables,X,Y we have

I(X;Y)0 (1.6)

with equality if and only if X and Y are independent.

An improvement of this result via Theorem 2 is as follows Corollary 2. For any two random variables,X,Y we have

I(X;Y) 1 2 lnb

X

x∈X

X

y∈Y

|p(x, y)−p(x)q(y)|

2

0 (1.7)

with equality if and only if X and Y are independent.

Now, letu(x) =|X |1 be the uniform probability mass function onX and letp(x) be the probability mass function forX.

(4)

It is well-known that [1, p. 27]

D(pku) = X

x∈X

p(x) logp(x) (1.8) u(x)

= log|X | −H(X).

The following corollary of Theorem 1 is important [1, p. 27].

Corollary 3. LetX be a random variable and|X |denotes the number of elements in the range ofX. Then

H(X)log|X | (1.9)

with equality if and only if X has a uniform distribution over X. Using Theorem 2 we also can state

Corollary 4. LetX be as above. Then

log|X | −H(X) 1 2 lnb

"

X

x∈X

p(x) 1

|X |

#2

0.

(1.10)

The equality holds iffpis uniformly distributed on X .

In the recent paper [2], the authors proved between other the following upper bound for the relative entropy and employed it in Coding Theory in connection to Noiseless Coding Theorem:

Theorem 3. Under the above assumptions forp(x)andq(x)we have the inequal- ity

1 lnb

"

X

x∈X

p2(x) q(x) 1

#

≥D(pkq) (1.11)

with equality if and only if p(x) =q(x)for allx∈ X .

The following upper bound for the mutual information holds Corollary 5. For any two random variables,X,Y we have

1 lnb

X

x∈X

X

y∈Y

p2(x, y) p(x)q(y)1

≥I(X;Y)

with equality iffX andY are independent.

Finally, we note that the following upper bound for the difference log|X |−H(X) is valid

Corollary 6. We have 1 lnb

"

|X |X

x∈X

p2(x)1

#

log|X | −H(X) with equality if and only if pis uniformly distributed on X .

The main aim of this paper is to point out some refinements of (1.2) and (1.11) and to apply them for the entropy mapping and for mutual information .

(5)

4 S.S. DRAGOMIR

2. An Inequality for Relative Entropy The following result holds

Theorem 4. Let p(x), q(x), x∈ X be two probability mass functions. Then 1

lnb

"

X

x∈X

p2(x) q(x) 1

#

1 lnb

(X

x∈X

"p(x) q(x)

p(x)

1

#) (2.1)

≥D(pkq) 1 lnb

(X

x∈X

"

1 q(x)

p(x)

p(x)#)

0.

The equality holds in all inequalities if and only ifp(x) =q(x)for allx∈ X . Proof. We know that for every differentiable strictly convex mappingf :I→R, we have the double inequality

f0(x) (x−y)≥f(x)−f(y)≥f0(y) (x−y) for allx, y∈I.The equality holds iffx=y.

Now, if we apply this inequality to the strictly convex mapping ln (·) on the interval (0,∞) we deduce

1

y(x−y)lnx−lny≥ 1

x(x−y) for allx, y >0,with equality iffx=y.

Now, choosex=aa, y=ba in the above inequality to deduce a

b a

1≥alna−alnb≥1 b

a a

(2.2)

with equality iffa=b.

If in (2.2) we puta=p(x), b=q(x) , then we obtain p(x)

q(x) p(x)

1≥p(x) lnp(x)−p(x) lnq(x)1 q(x)

q(x) p(x)

, with equality iffp(x) =q(x), x∈ X .

Summing overx∈ X we get X

x∈X

"p(x) q(x)

p(x)

1

#

≥De(pkq) X

x∈X

"

1 q(x)

p(x) p(x)#

with equality iffp(x) =q(x), x∈ X . Now, let observe that

q(x) p(x)

p(x)

=q(x) 1

p(x) p(x)

1 q(x)

1p(x)

, x∈ X.

Using theweighted arithmetic mean-geometric mean inequality, i.e., we recall it atb1t≤ta+ (1−t)b, a, b >0, t∈(0,1)

with equality iffa=b,for the choices a= 1

p(x), b= 1

q(x), t=p(x), x∈ X

(6)

we get

1 p(x)

p(x) 1 q(x)

1p(x)

1 +1−p(x) (2.3) q(x)

with equality iffp(x) =q(x), x∈ X . If we multiply (2.3) byq(x)0 we get

q(x) p(x)

p(x)

1 +q(x)−p(x), x∈ X. (2.4)

Summing in (2.4) over x∈ X,we deduce X

x∈X

q(x) p(x)

p(x)

≤ |X | −X

x∈X

q(x)X

x∈X

p(x) =|X | from where we deduce

X

x∈X

"

1 q(x)

p(x) p(x)#

0 with equality iffp(x) =q(x), x∈ X .

Also, we have p(x)

q(x) p(x)

=p(x) 1

q(x) p(x)

1 p(x)

1p(x)

, x∈ X. Using again theweighted A-G-inequality we can state that

1 q(x)

p(x) 1 p(x)

1p(x)

≤p(x) q(x)+ 1

p(x)1.

Multiplying this inequality byp(x)0 we get p(x)

q(x) p(x)

1 + p(x)2

q(x) −p(x), x∈ X which gives, by summation overx,that

X

x∈X

p(x) q(x)

p(x)

≤ |X |+X

x∈X

p(x)2 q(x) X

x∈X

p(x) =|X |+X

x∈X

p(x)2 q(x) 1, i.e.,

X

x∈X

"

p(x) q(x)

p(x)

1

#

X

x∈X

p(x)2 q(x) 1 with equality iffp(x) =q(x), x∈ X .

As logb(x) =lnlnxb,we deduce the desired inequality (2.1). The following corollaries are important

Corollary 7. Let X be a random variable with the probability distribution p(x), x∈ X . Then we have the inequality

1 lnb

"

|X |X

x∈X

p2(x)1

#

1 lnb

X

x∈X

h

|X |p(x)p(x)p(x)1i (2.5)

(7)

6 S.S. DRAGOMIR

log|X | −H(X) 1 lnb

X

x∈X

"

1 1

|X |p(x)p(x)p(x)

#

0.

The equality holds in all inequalities iffpis uniformly distributed on X .

The proof follows by the above theorem choosingq=u, the uniform distribution onX .

The following corollary for mutual information also holds.

Corollary 8. LetX andY be as in the above theorem. Then 1

lnb

X

x∈X

X

y∈Y

p2(x, y) p(x)q(y)1

1 lnb

X

x∈X

X

y∈Y

"

p(x, y) p(x)q(y)

p(x,y)

1

# (2.6)

≥I(X, Y) 1 lnb

X

x∈X

X

y∈Y

"

1

p(x, y) p(x)q(y)

p(x,y)#

0.

The equality holds in all inequalities iffX andY are independent.

For recent results in the applications of Theory of Inequalities in Information Theory and Coding, see the papers listed bellow.

References

[1] T.M. Cover and J.A. Thomas, Elements of Information Theory, John Wiley & Sons Inc., 1991.

[2] S.S. Dragomir and N.M. Dragomir, An inequality for logarithms and its applications in coding theory,Indian Journal of Mathematics (in press).

[3] S. S. Dragomir and C. J. Goh, Some bounds on entropy measures in information theory, Appl. Math. Lett.,10(3) (1997) 23–28.

[4] S. S. Dragomir and C. J. Goh, A counterpart of Jensen’s discrete inequality for differentiable convex mappings and applications in information theory,Math. Comput. Modelling, 24(2) (1996) 1–11.

[5] M. Mati´c, C. E. M. Pearce and J. Peˇcari´c, Improvements of some bounds on entropy measures in information theory,Math. Inequal. and Appl., 1(1998) 295–304.

[6] M. Mati´c, C. E. M. Pearce and J. Peˇcari´c, Refinements of some bounds in information theory, J. Austr. Math. Soc. Ser. B,to appear.

[7] S.S. Dragomir, N.M. Dragomir and K. Pranesh, Some Estimations of Kraft number and related results,Soochow Journal of Mathematics,24(4), 291-296.

[8] S.S. Dragomir, M. Sholz and J. Sunde, Some upper bounds for relative entropy and applica- tions,submitted.

[9] S.S. Dragomir, Some remarks on the noiseless coding theorem,submitted.

[10] C. Calude and C. Grozea, Kraft-Chaitin inequality revisited,J. UCS,5(1996), 306-310.

School of Communications and Informatics, Victoria University of Technology E-mail address:[email protected]

URL:http://matilda.vut.edu.au/~rgmia/

Referensi

Dokumen terkait

From the data, the percentages of relative clause based on the types and functions in George Bernard Shaw’s play Arms and The Man are 42,75% for relative pronouns as subject,

This research discusses Indonesian Relative Clauses and its translation into English Relative Clauses in the short story Clara atawa Wanita yang Diperkosa.. The aim of

Among several approaches that have been employed so far for increasing the water stability of MOFs, ligand functionalization is the most widely employed strategy.15, 16 Ferey’s group

INVESTIGATION ON THE POSSIBILITY OF USING ENTROPY APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF FROG SPECIES Abstract Animal species identification based on their sound has

However, this article fo- cuses attention on culture-specific words only and the relative success achieved in the representation of culture-bound meaning for Zulu students in the

In order to address this question, we have introduced in this paper the concept of functions -separated in the midpoint point a+b2 on a closed interval[a; b];which generalizes the

An integral inequality is developed from which when applied to composite quadrature rules in numerical integration it is proved that there is a three fold improvement in the remainder

Keywords: inductive method, infinite-time ruin probability, reinsurance contract, recursive equation, upper bound.. Introduction The problem of ruin has a long history in risk theory,