Basic Concepts from Probability Theory
2.3 Joint and Conditional Distributions
When using the standard normal distribution, however, one obtains a much smaller probability than the bound due to (2.4):
P.jX j 2 /DP
jX j
2
D2P
X
2
0:044:
2.3 Joint and Conditional Distributions 23 The variables are calledstochastically independentif, for arbitrary arguments, the joint distribution is given as the product of the marginal densities:
fx;y;z.x;y;z/Dfx.x/fy.y/fz.z/ ; which implies pairwise independence:
fx;y.x;y/Dfx.x/fy.y/ : The joint probability
P.Xa;Y b;Zc/D Z c
1
Z b 1
Z a 1
fx.x/fy.y/fz.z/dxdydz is, under independence, factorized to
P.Xa;Yb;Zc/D Z c
1
Z b 1
fy.y/fz.z/
Z a 1
fx.x/dx
dydz D
Z c 1
fz.z/
Z b 1
fy.y/dy Z a
1
fx.x/dx
dz D
Z a 1
fx.x/dx Z b
1
fy.y/dy Z c
1
fz.z/dz DP.Xa/P.Yb/P.Zc/:
Covariance
In particular for only two variables a generalization of the expectation operator is considered. Lethbe a real-valued function of two variables,h:R2 ! R, then we define as a double integral:
EŒh.X;Y/D Z 1
1
Z 1
1
h.x;y/fx;y.x;y/dxdy:
Hence, thecovariancebetweenXandYcan be defined as follows:
Cov.X;Y/WDEŒ.X E.X//.Y E.Y//
D E.XY/ E.X/E.Y/ ;
where the finiteness of these integrals is again assumed tacitly. It can be easily shown that the independence of two variables implies their uncorrelatedness, i.e.
Cov.X;Y/D 0, whereas the reverse does not generally hold true. In particular, the
covariance only measures the linear relation between two variables. In order to have the measure independent of the units, it is usually standardized as follows:
xyD Cov.X;Y/ pVar.X/p
Var.Y/:
Thecorrelation coefficientxyis smaller than or equal to one in absolute value, see Problem2.7.
Example 2.5 (Bivariate Normal Distribution) Let X and Y be two Gaussian random variables,
XN.x; x2/; Y N.y; y2/;
with correlation coefficient. We talk about a bivariate normal distribution if the joint density takes the following form:
fx;y.x;y/D 1 2xy
p1 2 'x;y.x;y/
with'x;y.x;y/equal to exp
( 1 2.1 2/
"x x
x
2
2
x x
x
y y
y
C
y y
y
2#) : Symbolically, we denote the vector as
X Y
N2.; ˙ /;
whereis a vector and˙stands for a symmetric matrix:
D
x
y
; ˙ D x2 Cov.X;Y/
Cov.X;Y/ y2
! : In general, the covariance matrix is defined as follows:
˙ DE
X E.X/
Y E.Y/
.X E.X/; Y E.Y//
:
2.3 Joint and Conditional Distributions 25
Note that in the case of uncorrelatedness.D0/it holds that
fx;y.x;y/D 1 p2x
exp
.x x/2 2x2
1 p2y
exp
( .y y/2 2y2
)
Dfx.x/fy.y/:
The joint density function is then determined as the product of the individual densities. Consequently, the random variablesXandYare independent. Therefore it follows, in particular for the normal distribution, that uncorrelatedness is equivalent to stochastic independence. Furthermore, bivariate Gaussian random variables have the property that each linear combination is univariate normally distributed. More precisely, it holds for2R2with70D.1; 2/that:
0 X
Y
D1XC2Y N.0; 0˙ /:
Interesting special cases are obtained with 0 D .1; 1/ and 0 D .1; 1/ for sums and differences. Note that furthermore for multivariate normal distributions necessarily all marginal distributions are normal (with0D.1; 0/and0D.0; 1/).
The reverse does not hold. A bivariate example for Gaussian marginal distributions withoutjoint normal distributions is given by Bickel and Doksum (2001, p. 533).
Cauchy-Schwarz Inequality
The inequality by Cauchy and Schwarz is the reason whyjxyj 1 applies. The following statement is verified in Problem2.6.
Lemma 2.2 (Cauchy-Schwarz Inequality) For arbitrary random variables Y and Z it holds that
jE.YZ/j p E.Y2/p
E.Z2/; (2.5)
where finite moments are assumed.
We want to supplement the Cauchy-Schwarz inequality by an intermediate inequal- ity, see (2.8). For this purpose we remember the so-calledtriangle inequalityfor
7Up to this point a superscript prime at a function has denoted its derivative. In the rare cases in which we are concerned with matrices or vectors, the symbol will also be used to indicate transposition. Bearing in mind the respective context, there should not occur any ambiguity.
two real numbers:
ja1Ca2j ja1j C ja2j: Obviously, this can be generalized to:
ˇˇ ˇˇ ˇ
Xn iD1
ai ˇˇ ˇˇ ˇ
Xn iD1
jaij:
If the sequence is absolutely summable, it is allowed to setn D 1. This suggests that an analogous inequality also applies for integrals. If the functiongis continuous, this implies continuity ofjgjand one obtains:
ˇˇ ˇˇ Z
g.x/dx ˇˇ ˇˇ
Z
jg.x/jdx:
This implies for the expected value of a random variableX:
jE.X/j E.jXj/: (2.6)
This relation resembles (2.2); in fact, both relations are special cases ofJensen’s inequality.8A random variable is calledintegrableif E.jXj/ <1. Of course this implies a finite expected value. For integrability a finite second moment is sufficient, which follows from (2.5) withYD jXjandZD1:
E.jXj/p EjXj2p
12Dp E.X/2:
Now, if settingXDYZin (2.6), it follows that:jE.YZ/j E.jYjjZj/. This is the bound added to (2.5):
jE.YZ/j E.jYjjZj/p
E.Y2/p
E.Z2/: (2.8)
The first inequality follows from (2.6). The second one will be verified in the problem section.
8The general statement is: for a convex functiongit holds
g.E.X//E.g.X//I (2.7)
see e.g. Sydsæter et al. (1999, p. 181), while a proof is given e.g. in Davidson (1994, Ch. 9) or Ross (2010, p. 409).
2.3 Joint and Conditional Distributions 27
Conditional Distributions
Conditional distributions and densities, respectively, are defined as the ratio of the joint density and the “conditioning density”, i.e. they are defined by the following density functions (where positive denominators are assumed):
fxjy.x/D fx;y.x;y/
fy.y/ ; fxjy;z.x/D fx;y;z.x;y;z/
fy;z.y;z/ ; fx;yjz.x;y/D fx;y;z.x;y;z/
fz.z/ :
It should be clear that these conditional densities are in fact density functions. In case of independence it holds by definition that the conditional and the unconditional densities are equal, e.g.
fxjy.x/Dfx.x/ :
This is very intuitive: In case of two independent random variables, one does not have any influence on the probability with which the other takes on values.
Conditional Expectation
If the random variablesXandY are not independent and if the realization ofY is known,YDy, then the expectation ofXwill be affected:
E.XjYDy/D Z 1
1
xfxjy.x/dx:
Analogously, we define the conditional expectation of a random variableZ,Z D h.X;Y/,hWR2 !R, givenY Dyas:
E.ZjYDy/DE.h.X;Y/jYDy/
D Z 1
1
h.x;y/fxjy.x/dx:
In particular, forh.X;Y/DX g.Y/withgWR!Rone therefore obtains E.X g.Y/jYDy/Dg.y/
Z 1
1
xfxjy.x/dx Dg.y/E.XjY Dy/:
Here, the marginal density ofXis replaced by the conditional density conditioned on the valueY Dy.
Technically, we can calculate the density conditioned on the random variableY instead of conditioned on a value9Y Dy:
fxjY.x/D fx;y.x;Y/
fy.Y/ :
ByfxjY.x/a transformation of the random variableYand consequently a new random variable is obtained. This is also true for the related conditional expectations:
E.XjY/D Z 1
1
xfxjY.x/dx; E.h.X;Y/jY/D
Z 1
1
h.x;Y/fxjY.x/dx:
As this is about random variables, it is absolutely reasonable to determine the expected value over the conditional expectation. This calculation can be carried out applying a rule called the “law of iterated expectations (LIE)” in the literature; it is given in Proposition2.1. In order to prevent confusion whetherXorYis integrated, it is advisable to subscript the expectation operator accordingly:
EyŒEx.XjY/D Z 1
1
ŒEx.Xjy/fy.y/dyD Z 1
1
Z 1
1
x fxjy.x/dx
fy.y/dy:
AlthoughY andg.Y/are random variables, after conditioning on Y they can be treated as constants and in case of a multiplicative composition, they can be put in front of the expected value when integration is with respect toX. This is the second statement in the following proposition, also cf. Davidson (1994, Theorem 10.10).
The first statement will be derived in Problem2.9.
Proposition 2.1 (Conditional Expectation) With the notation introduced above, it holds that:
(a) EyŒEx.XjY/DEx.X/,
(b) Eh.g.Y/XjY/Dg.Y/Ex.XjY/ for h.X;Y/DX g.Y/.
9This is not a really rigorous way of introducing expectations conditioned on random variables.
A mathematically correct exposition, however, requires measure theoretical arguments not being available at this point; cf. for example Davidson (1994, Ch. 10), or Klebaner (2005, Ch. 2). More generally, one may define expectations conditioned on a-algebra, E.XjG/, whereGcould be the -algebra generated byY:GD .Y/.