Given two random variables, X and Y, with respective means, µX and µY, the covariance is defined by,
Cov(X, Y) =E
(X−µX)(Y −µY)
=E[X Y]−µxµy.
The second formula follows by expansion. Notice also that Cov(X, X) = Var(X) by comparing with (3.3). The covariance is a common measure of the relationship between the two random variables, where if Cov(X, Y) = 0, we say the random variables are uncorrelated. Furthermore, if Cov(X, Y)6= 0, the its sign gives an indication of the relationship.
Another important concept is the correlation coefficient, ρXY = Cov(X, Y)
pVar(X)Var(Y). (3.26)
It is a normalized form of the covariance with −1≤ ρXY ≤1. Values nearing ±1 indicate a very strong linear relationshipbetween X and Y, whereas values near or at 0indicate a lack of a linear relationship.
Note that ifX andY are independent random variables, then Cov(X, Y) = 0and henceρXY = 0. However, the opposite case does not always hold, since in general ρXY = 0 does not imply independence. Nevertheless as described below, for jointly normal random variables it does.
Consider now a random vectorX= (X1, . . . , Xn), taken as a column vector. It can be described by moments in an analogous manner to a scalar random variable as was detailed in Section 3.2. A key quantity is the mean vector,
µX :=
E[X1],E[X2], . . . ,E[Xn]T
.
Furthermore, the covariance matrix is the matrix defined by the expectation (taken element wise) of the (outer product) random matrix given by(X−µX)(X−µX)T, and is expressed as
ΣX =Cov(X) =E[(X−µx)(X−µx)T]. (3.27) As can be verified, the i, j’th element of ΣX is Cov(Xi, Xj) and hence the diagonal elements are the variances.
Linear Combinations and Transformations
We now considerlinear transformationsapplied to random vectors. For any collection of random variables,
E[X1+. . .+Xn] =E[X1] +. . .+E[Xn].
For uncorrelated random variables,
Var(X1+. . .+Xn) =Var(X1) +. . .+Var(Xn).
More generally if we allow the random variables to be correlated, then, Var(X1+. . .+Xn) =Var(X1) +. . .+Var(Xn) + 2X
i<j
Cov(Xi, Xj). (3.28)
122 CHAPTER 3. PROBABILITY DISTRIBUTIONS - DRAFT
Figure 3.25: Random vectors from three different distributions, each sharing the same mean and covariance matrix.
Note that the right hand side of (3.28) is the sum of the elements of the matrix Cov (X1, . . . , Xn) . This is a special case of a more general affine transformation, where we take a random vector X = (X1, . . . , Xn) with covariance matrix ΣX, and anm×n matrix A and m vector b. We then set,
Y=AX+b. (3.29)
In this case, the new random vectorY exhibits mean and covariance,
E[Y] =AE[X] +b and Cov(Y) =AΣXAT. (3.30) Now to retrieve (3.28), we use the1×nmatrixA= [1, . . . ,1]and observe thatAΣXAT is a sum of all of the elements ofΣX.
The Cholesky Decomposition and Generating Random Vectors
Say now that you wish to create an n dimensional random vector Y with some specified mean vector µY and covariance matrixΣY. That is, µY and ΣY are known.
The formulas in (3.30) yield a potential recipe for such a task if we are given a random vector X with zero mean and identity covariance matrix (ΣX=I). For example, in the context of Monte Carlo random variable generation, creating such a random vector X is trivial – just generate a sequence ofni.i.d. normal(0,1) random variables.
Now apply the affine transformation (3.29) on X withb=µY and a matrix Athat satisfies,
ΣY =AAT. (3.31)
Now (3.30) guarantees thatY has the desired µY and ΣY.
3.7. JOINT DISTRIBUTIONS AND COVARIANCE 123 The question is now how to find a matrix A that satisfies (3.31). For this the Cholesky de- composition comes as an aid. As an example assume we wish to generate a random vector Y with,
µY =
"
15 20
#
and ΣY=
"
6 4 4 9
# .
Listing 3.32 generates random vectors with these mean vector and covariance matrix using three alternative forms of zero-mean, identity-covariance matrix random variables. As you can see from Figure 3.25, such distributions can be very different in nature even though they share the same first and second order characteristics. The output also presents mean and variance estimates of the random variables generated, showing they agree with the specifications above.
Listing 3.32: Generating random vectors with desired mean and covariance
1 using Distributions, LinearAlgebra, LaTeXStrings, Random, Plots; pyplot() 2 Random.seed!(1)
3
4 N = 10^5 5
6 SigY = [ 6 4 ;
7 4 9]
8 muY = [15 ;
9 20]
10 A = cholesky(SigY).L 11
12 rngGens = [()->rand(Normal()),
13 ()->rand(Uniform(-sqrt(3),sqrt(3))), 14 ()->rand(Exponential())-1]
15
16 rv(rg) = A*[rg(),rg()] + muY 17
18 data = [[rv(r) for _ in 1:N] for r in rngGens]
19
20 stats(data) = begin
21 data1, data2 = first.(data),last.(data)
22 println(round(mean(data1),digits=2), "\t",round(mean(data2),digits=2),"\t", 23 round(var(data1),digits=2), "\t", round(var(data2),digits=2), "\t", 24 round(cov(data1,data2),digits=2))
25 end 26
27 println("Mean1\tMean2\tVar1\tVar2\tCov") 28 for d in data
29 stats(d) 30 end
31
32 scatter(first.(data[1]), last.(data[1]), c=:blue, ms=1, msw=0, label="Normal") 33 scatter!(first.(data[2]), last.(data[2]), c=:red, ms=1, msw=0, label="Uniform") 34 scatter!(first.(data[3]), last.(data[3]), c=:green, ms=1, msw=0, label="Exponential", 35 xlims=(0,40), ylims=(0,40), legend=:bottomright, ratio=:equal,
36 xlabel=L"X_1", ylabel=L"X_2")
Mean1 Mean2 Var1 Var2 Cov 14.99 19.99 6.01 9.0 4.0 15.0 20.0 6.01 8.96 3.97 15.0 19.98 6.03 8.85 4.01
124 CHAPTER 3. PROBABILITY DISTRIBUTIONS - DRAFT
We define the covariance matrix SigY and the mean vector muY in lines 6-9. In line 10 we use cholesky()fromLinearAlgebratogether with.Lto compute a lower triangular matrixAthat satisfies (3.31). In lines 12-14 we define an array of functions, rngGens, where each element is a function that generates a scalar random variable with zero mean and unit variance. The first entry is a standard normal, the second entry is a uniform on[−√
3,√
3]and the third entry is a unit exponential shifted by−1. The function we define in line 16,rv(), assumes an input argument which is a function to generate a random value and then implements the transformation (3.29). In line 18 we create an array of 3 arrays, with each internal array consisting of N 2-dimensional random vectors. We then define a functionstats()in lines 20-25 which calculates and prints first and second order statistics.
Note the use of beginand endto define the function. The function is then used in lines 27-30 for printing output. The remainder of the code creates Figure 3.25 usingdata.
Bivariate Normal
One of the most ubiquitous families of multi-variate distributions is the multi-variate normal distribution. Similarly to the fact that a scalar (univariate) normal distribution is parametrized by the mean µ and the variance σ2, a multi-variate normal distribution is parametrized by the mean vector µX and the covariance matrixΣX.
We begin first with the standard multi-variate havingµX =0 mean andΣX =I. In this case, the PDF for the random vectorX= (X1, . . . , Xn) is,
f(x) = (2π)−n/2e−12xTx. (3.32) Listing 3.33 illustrates numerically that this is a valid PDF for increasing dimensions. The example also illustrates how to use numerical integration. The integral (3.24) is carried out. As is observed from the output, the integral is accurate for dimensionsn= 1, . . . ,8 after which accuracy is lost for the given level of computational effort specified (up to107 function evaluations).
Listing 3.33: Multidimensional integration
1 using HCubature 2
3 M = 4.5 4 maxD = 10 5
6 f(x) = (2*pi)^(-length(x)/2) * exp(-(1/2)*x’x) 7
8 for n in 1:maxD 9 a = -M*ones(n) 10 b = M*ones(n)
11 I,e = hcubature(f, a, b, maxevals = 10^7)
12 println("n = $(n), integral = $(I), error (estimate) = $(e)") 13 end
n = 1, integral = 0.9999932046537506, error (estimate) = 4.365848932375016e-10 n = 2, integral = 0.9999864091389514, error (estimate) = 1.487907641465839e-8 n = 3, integral = 0.9999796140804286, error (estimate) = 1.4899542976517278e-8 n = 4, integral = 0.9999728074508313, error (estimate) = 4.4447365681340567e-7 n = 5, integral = 0.999965936103044, error (estimate) = 2.3294669134930872e-5