Lecture-11: Generating functions
1 Generating functions
Suppose thatX:Ω→Ris a continuous random variable on the probability space(Ω,F,P)with distribution functionFX:R→[0, 1].
1.1 Characteristic function
Example 1.1. Letj≜√
−1, then we can show thathu:R→Cdefined byhu(x)≜ejux=cos(ux) +jsin(ux) is also Borel measurable for allu∈R. Thus,hu(X):Ω→Cis a complex valued random variable on this probability space.
Definition 1.2. For a random variableX:Ω→Rdefined on the probability space(Ω,F,P), thecharacter- istic functionΦX:R→Cis defined byΦX(u)≜EejuX for allu∈Randj2=−1.
Remark1. The characteristic functionΦX(u)is always finite, since|ΦX(u)|=EejuX ⩽E
ejuX =1.
Remark2. For a discrete random variableX:Ω→Xwith PMFPX:X→[0, 1], the characteristic function ΦX(u) =∑x∈XejuxPX(x).
Remark3. For a continuous random variableX:Ω→Rwith density functionfX:R→R+, the characteristic functionΦX(u) =R∞
−∞ejuXfX(x)dx.
Example 1.3 (Gaussian random variable). For a Gaussian random variable X:Ω→Rwith meanµand varianceσ2, the characteristic functionΦXis
ΦX(u) = √ 1 2πσ2
Z
x∈Rejuxe−
(x−µ)2
2σ2 dx=exp
− u
2σ2
2 +juµ .
We observe that|ΦX(u)|=e−u2σ2/2has Gaussian decay with zero mean and variance 1/σ2.
Theorem 1.4. IfE|X|Nis finite for some integer N∈N, thenΦ(k)X (u)is finite and continuous functions of u∈R for all k∈[N]. Further,Φ(k)X (0) =jkE[Xk]for all k∈[N].
Proof. SinceE|X|Nis finite, then so isE|X|kfor allk∈[N]. Therefore,E[Xk]exists and is finite. Exchanging derivative and the integration (which can be done sinceejuxis a bounded function with all derivatives), and evaluating the derivative atu=0, we getΦ(k)X (0) =EhdkejuX
duk
u=0
i=jkE[Xk].
Theorem 1.5. Two random variables have the same distribution iff they have the same characteristic function.
Proof. It is easy to see the necessity and the sufficiency is difficult.
1
1.2 Moment generating function
Example 1.6. A functiongt:R→R+defined bygt(x)≜etxis monotone and hence Borel measurable for allt∈R. Therefore,gt(X):Ω→R+is a positive random variable on this probability space.
Definition 1.7. For a random variable X:Ω→Rdefined on the probability space(Ω,F,P), themoment generating functionMX:R→R+is defined byMX(t)≜EetXfor allt∈RwhereMX(t)is finite.
Remark4. Characteristic function always exist, however are complex in general. Sometimes it is easier to work with moment generating functions, when they exist.
Lemma 1.8. For a random variable X, if the MGF MX(t)is finite for some t∈R, then MX(t) =1+∑n∈Ntn!nE[Xn]. Proof. From the Taylor series expansion of eθ around θ=0, we geteθ =1+∑n∈Nθn!n. Therefore, taking θ=tX, we can writeetX=1+∑n∈Ntn!nXn. Taking expectation on both sides, the result follows from the linearity of expectation, when both sides have finite expectation.
Example 1.9 (Gaussian random variable). For a Gaussian random variable X:Ω→Rwith meanµand varianceσ2, the moment generating functionMXisMX(t) =exp
t2σ2 2 +tµ
.
1.3 Probability generating function
For a non-negative integer-valued random variableX:Ω→X⊆Z+, it is often more convenient to work with thez-transform of the probability mass function, called the probability generating function.
Definition 1.10. For a discrete non-negative integer-valued random variableX:Ω→X⊆Z+with proba- bility mass functionPX:X→[0, 1], theprobability generating functionΨX:C→Cis defined by
ΨX(z)≜E[zX] =
∑
x∈X
zxPX(x), z∈C,|z|⩽1.
Lemma 1.11. For a non-negative simple random variable X:Ω→X, we have|ΨX(z)|⩽1for all|z|⩽1.
Proof. Letz∈Cwith|z|⩽1. LetPX:X→[0, 1]be the probability mass function of the non-negative simple random variableX. Since any realizationx∈Xof random variableXis non-negative, we can write
|ΨX(z)|=
∑
x∈X
zxPX(x)
⩽
∑
x∈X
|z|xPX(x)⩽
∑
x∈X
PX(x) =1.
Theorem 1.12. For a non-negative simple random variable X:Ω→Xwith finite kth momentEXk, the k-th deriva- tive of probability generating function evaluated at z=1is the k-th order factorial moment of X. That is,
Ψ(k)X (1) =E
"k−1
∏
i=0(X−i)
#
=E[X(X−1)(X−2). . .(X−k+1)].
Proof. It follows from the interchange of derivative and expectation.
Remark5. Moments can be recovered fromkth order factorial moments. For example, E[X] =Ψ′X(1), E[X2] =Ψ(2)X (1) +Ψ′X(1).
2
Theorem 1.13. Two non-negative integer-valued random variables have the same probability distribution iff their z-transforms are equal.
Proof. The necessity is clear. For sufficiency, we see that Ψ(k)X =∑x⩾kk!zx−kPX(x) and hence Ψ(k)X (0) = k!PX(k). Further, interchanging the derivative and the summation (by dominated convergence theorem), we get the second result.
2 Gaussian Random Vectors
Definition 2.1. For a random vectorX:Ω→Rndefined on a probability space(Ω,F,P), we can define the characteristic functionΦX:Rn→CbyΦX(u)≜Eej⟨u,X⟩whereu∈Rn.
Remark6. IfX:Ω→Rnis an independent random vector, thenΦX(u) =∏ni=1ΦXi(ui)for allu∈Rn. Definition 2.2. For a probability space(Ω,F,P),Gaussian random vectoris a continuous random vector X:Ω→Rndefined by its density function
fX(x)≜p 1
(2π)ndet(Σ)exp
−1
2(x−µ)TΣ−1(x−µ)
for allx∈Rn,
where the vectorµ∈Rnand the positive definite matrixΣ∈Rn×n.
Remark7. For a Gaussian random vector with vectorµ= (µ1, . . . ,µ1)for some real scalarµ1and matrixΣ= σ2Ifor some positiveσ2∈R+, we can write its density as fX(x) = 1
(2πσ2)n/2exp
−12∑ni=1(xi−µ1)2
σ2
for allx∈ Rn. It follows thatXis ani.i.d.random vector with each component being a Gaussian random variable with meanµ1and varianceσ2. The characteristic functionΦX of ani.i.d.Gaussian random vectorX:Ω→Rn parametrized by(µ1,σ2)is given byΦX(u) =∏ni=1ΦXi(ui) =exp
−σ22∑ni=1u2i +jµ1∑ni=1ui
.
Lemma 2.3. For ani.i.d.zero mean unit variance Gaussian vector Z:Ω→Rn, vectorα∈Rn, and scalarµ∈R, the affine combination Y≜µ+⟨α,Z⟩is a Gaussian random variable.
Proof. From the linearity of expectation and the fact thatZis a zero mean vector, we getEY=µ. Further, from the linearity of expectation and the fact thatE[ZZT] =I, we get
σ2≜Var(Y) =E(Y−µ)2=
∑
n i=1∑
n k=1αiαkE[ZiZk] =⟨α,α⟩=∥α∥22=
∑
n i=1α2i.
To show thatYis Gaussian, it suffices to show thatΦY(u) =exp(−u22σ2 +juµ)for anyu∈R. Recall that Zis an independent random vector with individual components being identically zero mean unit variance Gaussian. Therefore,ΦZi(u) =exp(−u22), and we can compute the characteristic function ofYas
ΦY(u) =EejuY=ejuµE
∏
ni=1
ejuαiZi =ejuµ
∏
n i=1ΦZi(uαi) =exp(−u
2σ2
2 +juµ).
Theorem 2.4. A random vector X:Ω→Rn is Gaussian with parameters(µ,Σ)iff Z≜Σ−12(X−µ)is ani.i.d.
zero mean unit variance Gaussian random vector.
Proof. Let X=µ+Σ12Z for an i.i.d.zero mean unit variance Gaussian random vector Z:Ω→Rn, then we will show that X is a Gaussian random vector by transformation of random vector densities. Since the(i,j)th component of the Jacobian matrix J(x) is given by Jij(x) = ∂x∂zj
i =Σi,j12 for alli,j∈[n], we can
3
write the Jacobian matrix J(x) =Σ12, Since the density of Zis fZ(z) = √1
(2π)nexp(−12zTz), and from the transformation of random vectors, we get
fX(x) = fZ(Σ−12(x−µ)) det(Σ12)
= 1
(2π)n/2det(Σ)1/2exp −1
2(x−µ)TΣ−1(x−µ), x∈Rn.
Conversely, we can show that ifXis a Gaussian random vector, thenZ=Σ−12(X−µ)is ani.i.d.zero mean unit variance Gaussian random vector by transformation of random vectors.
Remark8. A random vectorX:Ω→Rn with meanµ∈Rn and covariance matrixΣ∈Rn×n is Gaussian iffX can be written as X=µ+Σ12Z, for ani.i.d.Gaussian random vectorZ:Ω→Rn with mean 0 and variance 1. It follows thatEX=µandΣ=E(X−µ)(X−µ)T.
Remark 9. We observe that the components of the Gaussian random vector X=µ+AZfor A=Σ12 are Gaussian random variables with meanµi and variance∑nk=1A2i,k= (AAT)i,i=Σi,i, since each component Xi=µi+∑nk=1Ai,kZkis an affine combination of zero mean unit variancei.i.d.random variables.
Remark10. For anyu∈Rn, we compute the characteristic functionΦXfrom the distribution ofZas ΦX(u) =Eej⟨u,X⟩=Eexp
j⟨u,µ⟩+jD
ATu,ZE
=exp(j⟨u,µ⟩)ΦZ(ATu) =exp(j⟨u,µ⟩ −1 2uTΣu). Lemma 2.5. If the components of the Gaussian random vector are uncorrelated, then they are independent.
Proof. If a Gaussian vector is uncorrelated, then the covariance matrixΣis diagonal. It follows that we can write fX(x) =∏ni=1fXi(xi)for allx∈Rn.
4