Lecture-11: Generating functions

(1)

Lecture-11: Generating functions

1 Generating functions

Suppose thatX:Ω→Ris a continuous random variable on the probability space(Ω,F,P)with distribution functionFX:R→[0, 1].

1.1 Characteristic function

Example 1.1. Letj≜√

−1, then we can show thathu:R→_Cdefined byhu(x)≜e^jux=cos(ux) +jsin(ux) is also Borel measurable for allu∈R. Thus,hu(X):Ω→Cis a complex valued random variable on this probability space.

Definition 1.2. For a random variableX:Ω→_Rdefined on the probability space(_Ω,_F,P), thecharacter- istic functionΦX:R→_Cis defined byΦX(u)≜Ee^juX for allu∈_Randj²=−1.

Remark1. The characteristic functionΦX(u)is always finite, since|_Φ_X(u)|=Ee^juX ⩽E

e^juX =1.

Remark2. For a discrete random variableX:Ω→_Xwith PMFPX:X→[0, 1], the characteristic function ΦX(u) =_∑_x∈_Xe^juxPX(x).

Remark3. For a continuous random variableX:Ω→_Rwith density functionfX:R→_R₊, the characteristic functionΦX(u) =R_∞

−∞e^juXfX(x)dx.

Example 1.3 (Gaussian random variable). For a Gaussian random variable X:Ω→_Rwith meanµand varianceσ², the characteristic functionΦXis

ΦX(u) = √ ¹ 2πσ²

Z

x∈Re^juxe⁻

(x−µ)²

2σ2 dx=exp

− ^u

2σ²

2 +juµ .

We observe that|ΦX(u)|=e^−u²^σ²^/2has Gaussian decay with zero mean and variance 1/σ².

Theorem 1.4. IfE|X|^Nis finite for some integer N∈_{N, then}_Φ^(k)_X (u)is finite and continuous functions of u∈_R for all k∈[N]. Further,Φ^(k)_X (0) =j^kE[X^k]for all k∈[N].

Proof. SinceE|X|^Nis finite, then so isE|X|^kfor allk∈[N]. Therefore,E[X^k]exists and is finite. Exchanging derivative and the integration (which can be done sincee^juxis a bounded function with all derivatives), and evaluating the derivative atu=0, we getΦ^(k)_X (0) =_E^h^d^k^e^juX

du^k

_u=0

i=j^kE[X^k].

Theorem 1.5. Two random variables have the same distribution iff they have the same characteristic function.

Proof. It is easy to see the necessity and the sufficiency is difficult.

1

(2)

1.2 Moment generating function

Example 1.6. A functiongt:R→R+defined bygt(x)≜e^txis monotone and hence Borel measurable for allt∈R. Therefore,gt(X):Ω→R+is a positive random variable on this probability space.

Definition 1.7. For a random variable X:Ω→_Rdefined on the probability space(_Ω,F,P), themoment generating functionMX:R→_R₊is defined byMX(t)≜Ee^tXfor allt∈_RwhereMX(t)is finite.

Remark4. Characteristic function always exist, however are complex in general. Sometimes it is easier to work with moment generating functions, when they exist.

Lemma 1.8. For a random variable X, if the MGF MX(t)is finite for some t∈_{R, then M}_X(t) =1+_∑_n∈_N^t_n!ⁿ_E[Xⁿ]. Proof. From the Taylor series expansion of e^θ around θ=0, we gete^θ =1+_∑_n∈_N^θ_n!ⁿ. Therefore, taking θ=tX, we can writee^tX=1+_∑_n∈_N^t_n!ⁿXⁿ. Taking expectation on both sides, the result follows from the linearity of expectation, when both sides have finite expectation.

Example 1.9 (Gaussian random variable). For a Gaussian random variable X:Ω→_Rwith meanµand varianceσ², the moment generating functionMXisMX(t) =exp

t²σ² 2 +tµ

.

1.3 Probability generating function

For a non-negative integer-valued random variableX:Ω→X⊆Z+, it is often more convenient to work with thez-transform of the probability mass function, called the probability generating function.

Definition 1.10. For a discrete non-negative integer-valued random variableX:Ω→X⊆Z+with probability mass functionP_X:X→[0, 1], theprobability generating functionΨX:C→_Cis defined by

ΨX(z)≜E[z^X] =

∑

x∈X

z^xP_X(x), z∈_C,|z|⩽1.

Lemma 1.11. For a non-negative simple random variable X:Ω→X, we have|ΨX(_z)|⩽1for all|z|⩽1.

Proof. Letz∈Cwith|z|⩽1. LetP_X:X→[_{0, 1}]be the probability mass function of the non-negative simple random variableX. Since any realizationx∈_Xof random variableXis non-negative, we can write

|_Ψ_X(z)|=

∑

x∈X

z^xPX(x)

⩽

∑

x∈X

|z|^xPX(x)⩽

∑

x∈X

PX(x) =1.

Theorem 1.12. For a non-negative simple random variable X:Ω→_Xwith finite kth momentEX^k, the k-th derivative of probability generating function evaluated at z=1is the k-th order factorial moment of X. That is,

Ψ^(k)_X (1) =E

"_k−1

∏

i=0

(X−i)

#

=E[X(X−1)(X−2). . .(X−k+1)].

Proof. It follows from the interchange of derivative and expectation.

Remark5. Moments can be recovered fromkth order factorial moments. For example, E[X] =_Ψ^′_X(1), E[X²] =_Ψ⁽²⁾_X (1) +_Ψ^′_X(1).

2

(3)

Theorem 1.13. Two non-negative integer-valued random variables have the same probability distribution iff their z-transforms are equal.

Proof. The necessity is clear. For sufficiency, we see that Ψ^(k)_X =_∑_x⩾kk!z^x−kPX(x) and hence Ψ^(k)_X (0) = k!PX(k). Further, interchanging the derivative and the summation (by dominated convergence theorem), we get the second result.

2 Gaussian Random Vectors

Definition 2.1. For a random vectorX:Ω→_Rⁿdefined on a probability space(_Ω,F,P), we can define the characteristic functionΦX:Rⁿ→_CbyΦX(u)≜Ee^j⟨u,X⟩whereu∈_Rⁿ.

Remark6. IfX:Ω→Rⁿis an independent random vector, thenΦX(u) =_∏ⁿ_i=1ΦX_i(ui)for allu∈Rⁿ. Definition 2.2. For a probability space(Ω,F,P),Gaussian random vectoris a continuous random vector X:Ω→Rⁿdefined by its density function

fX(x)≜p ¹

(_2π)ⁿ_det(_Σ)^exp

−¹

2(x−µ)^T_Σ⁻¹(x−µ)

for allx∈_Rⁿ,

where the vectorµ∈Rⁿand the positive definite matrixΣ∈R^n×n.

Remark7. For a Gaussian random vector with vectorµ= (µ₁, . . . ,µ₁)for some real scalarµ₁and matrixΣ= σ²Ifor some positiveσ²∈_R₊, we can write its density as fX(x) = ¹

(2πσ²)^n/2exp

−¹₂_∑ⁿ_i=1^(xⁱ⁻^µ¹⁾²

σ²

for allx∈ Rⁿ. It follows thatXis ani.i.d.random vector with each component being a Gaussian random variable with meanµ₁and varianceσ². The characteristic functionΦX of ani.i.d.Gaussian random vectorX:Ω→Rⁿ parametrized by(µ₁,σ²)is given byΦX(u) =_∏ⁿ_i=1ΦX_i(ui) =exp

−^σ₂²_∑ⁿ_i=1u²_i +jµ1∑ⁿi=1ui

.

Lemma 2.3. For ani.i.d.zero mean unit variance Gaussian vector Z:Ω→Rⁿ, vectorα∈Rⁿ, and scalarµ∈R, the affine combination Y≜^µ+⟨α,Z⟩is a Gaussian random variable.

Proof. From the linearity of expectation and the fact thatZis a zero mean vector, we getEY=µ. Further, from the linearity of expectation and the fact thatE[ZZ^T] =I, we get

σ²≜Var(Y) =_E(Y−µ)²=

∑

n i=1

∑

n k=1

α_iα_kE[Z_iZ_k] =⟨α,α⟩=∥α∥²₂=

∑

n i=1

α²_i.

To show thatYis Gaussian, it suffices to show thatΦY(u) =exp(−^u²₂^σ² +juµ)for anyu∈R. Recall that Zis an independent random vector with individual components being identically zero mean unit variance Gaussian. Therefore,ΦZ_i(u) =_exp(−^u₂²), and we can compute the characteristic function ofYas

ΦY(u) =_Ee^juY=e^juµE

∏

ⁿ

i=1

e^juαⁱ^Zⁱ =e^juµ

∏

n i=1

ΦZ_i(uα_i) =_exp(−^u

2σ²

2 +juµ)_.

Theorem 2.4. A random vector X:Ω→Rⁿ is Gaussian with parameters(µ,Σ)iff Z≜Σ⁻¹²(X−µ)is ani.i.d.

zero mean unit variance Gaussian random vector.

Proof. Let X=µ+Σ¹²Z for an i.i.d.zero mean unit variance Gaussian random vector Z:Ω→Rⁿ, then we will show that X is a Gaussian random vector by transformation of random vector densities. Since the(i,j)th component of the Jacobian matrix J(x) is given by J_ij(x) = ^∂x_∂z^j

i =_Σ_i,j¹² for alli,j∈[n], we can

3

(4)

write the Jacobian matrix J(x) =Σ¹², Since the density of Zis fZ(z) = √¹

(2π)ⁿexp(−¹₂z^Tz), and from the transformation of random vectors, we get

f_X(x) = ^f^Z(_Σ⁻¹²(x−µ)) det(_Σ¹²)

= ¹

(2π)^n/2det(_Σ)^1/2^exp −¹

2(x−µ)^T_Σ⁻¹(x−µ)_, x∈Rⁿ.

Conversely, we can show that ifXis a Gaussian random vector, thenZ=Σ⁻¹²(X−µ)is ani.i.d.zero mean unit variance Gaussian random vector by transformation of random vectors.

Remark8. A random vectorX:Ω→Rⁿ with meanµ∈Rⁿ and covariance matrixΣ∈R^n×n is Gaussian iffX can be written as X=µ+Σ¹²Z, for ani.i.d.Gaussian random vectorZ:Ω→Rⁿ with mean 0 and variance 1. It follows thatEX=µandΣ=E(X−µ)(X−µ)^T.

Remark 9. We observe that the components of the Gaussian random vector X=µ+AZfor A=_Σ¹² are Gaussian random variables with meanµi and variance∑ⁿk=1A²_i,k= (AA^T)_i,i=_Σ_i,i, since each component X_i=µ_i+_∑ⁿ_k=1A_i,kZ_kis an affine combination of zero mean unit variancei.i.d.random variables.

Remark10. For anyu∈Rⁿ, we compute the characteristic functionΦXfrom the distribution ofZas ΦX(u) =Ee^j⟨u,X⟩=Eexp

j⟨u,µ⟩+jD

A^Tu,ZE

=exp(j⟨u,µ⟩)ΦZ(A^Tu) =exp(j⟨u,µ⟩ −¹ 2u^TΣu). Lemma 2.5. If the components of the Gaussian random vector are uncorrelated, then they are independent.

Proof. If a Gaussian vector is uncorrelated, then the covariance matrixΣis diagonal. It follows that we can write f_X(x) =_∏ⁿ_i=1f_X_i(x_i)for allx∈_Rⁿ.

4