JOINT DISTRIBUTIONS AND COVARIANCE 121 Covariance and Vectorized Moments

Given two random variables, X and Y, with respective means, µ_X and µ_Y, the covariance is defined by,

Cov(X, Y) =E

(X−µX)(Y −µY)

=E[X Y]−µxµy.

The second formula follows by expansion. Notice also that Cov(X, X) = Var(X) by comparing with (3.3). The covariance is a common measure of the relationship between the two random variables, where if Cov(X, Y) = 0, we say the random variables are uncorrelated. Furthermore, if Cov(X, Y)6= 0, the its sign gives an indication of the relationship.

Another important concept is the correlation coefficient, ρXY = Cov(X, Y)

pVar(X)Var(Y). (3.26)

It is a normalized form of the covariance with −1≤ ρ_XY ≤1. Values nearing ±1 indicate a very strong linear relationshipbetween X and Y, whereas values near or at 0indicate a lack of a linear relationship.

Note that ifX andY are independent random variables, then Cov(X, Y) = 0and henceρXY = 0. However, the opposite case does not always hold, since in general ρ_XY = 0 does not imply independence. Nevertheless as described below, for jointly normal random variables it does.

Consider now a random vectorX= (X1, . . . , Xn), taken as a column vector. It can be described by moments in an analogous manner to a scalar random variable as was detailed in Section 3.2. A key quantity is the mean vector,

µX :=

E[X1],E[X2], . . . ,E[Xn]T

Furthermore, the covariance matrix is the matrix defined by the expectation (taken element wise) of the (outer product) random matrix given by(X−µX)(X−µX)^T, and is expressed as

ΣX =Cov(X) =E[(X−µx)(X−µx)^T]. (3.27) As can be verified, the i, j’th element of Σ_X is Cov(X_i, X_j) and hence the diagonal elements are the variances.

Linear Combinations and Transformations

We now considerlinear transformationsapplied to random vectors. For any collection of random variables,

E[X1+. . .+Xn] =E[X1] +. . .+E[Xn].

For uncorrelated random variables,

Var(X₁+. . .+X_n) =Var(X₁) +. . .+Var(X_n).

More generally if we allow the random variables to be correlated, then, Var(X₁+. . .+X_n) =Var(X₁) +. . .+Var(X_n) + 2X

i<j

Cov(X_i, X_j). (3.28)

122 CHAPTER 3. PROBABILITY DISTRIBUTIONS - DRAFT

Figure 3.25: Random vectors from three different distributions, each sharing the same mean and covariance matrix.

Note that the right hand side of (3.28) is the sum of the elements of the matrix Cov (X₁, . . . , X_n) . This is a special case of a more general affine transformation, where we take a random vector X = (X1, . . . , Xn) with covariance matrix ΣX, and anm×n matrix A and m vector b. We then set,

Y=AX+b. (3.29)

In this case, the new random vectorY exhibits mean and covariance,

E[Y] =AE[X] +b and Cov(Y) =AΣXA^T. (3.30) Now to retrieve (3.28), we use the1×nmatrixA= [1, . . . ,1]and observe thatAΣXA^T is a sum of all of the elements ofΣX.

The Cholesky Decomposition and Generating Random Vectors

Say now that you wish to create an n dimensional random vector Y with some specified mean vector µ_Y and covariance matrixΣ_Y. That is, µ_Y and Σ_Y are known.

The formulas in (3.30) yield a potential recipe for such a task if we are given a random vector X with zero mean and identity covariance matrix (ΣX=I). For example, in the context of Monte Carlo random variable generation, creating such a random vector X is trivial – just generate a sequence ofni.i.d. normal(0,1) random variables.

Now apply the affine transformation (3.29) on X withb=µ_Y and a matrix Athat satisfies,

Σ_Y =AA^T. (3.31)

Now (3.30) guarantees thatY has the desired µY and ΣY.

3.7. JOINT DISTRIBUTIONS AND COVARIANCE 123 The question is now how to find a matrix A that satisfies (3.31). For this the Cholesky decomposition comes as an aid. As an example assume we wish to generate a random vector Y with,

µY =

15 20

and ΣY=

6 4 4 9

# .

Listing 3.32 generates random vectors with these mean vector and covariance matrix using three alternative forms of zero-mean, identity-covariance matrix random variables. As you can see from Figure 3.25, such distributions can be very different in nature even though they share the same first and second order characteristics. The output also presents mean and variance estimates of the random variables generated, showing they agree with the specifications above.

Listing 3.32: Generating random vectors with desired mean and covariance

1 using Distributions, LinearAlgebra, LaTeXStrings, Random, Plots; pyplot() 2 Random.seed!(1)

4 N = 10^5 5

6 SigY = [ 6 4 ;

7 4 9]

8 muY = [15 ;

9 20]

10 A = cholesky(SigY).L 11

12 rngGens = [()->rand(Normal()),

13 ()->rand(Uniform(-sqrt(3),sqrt(3))), 14 ()->rand(Exponential())-1]

16 rv(rg) = A*[rg(),rg()] + muY 17

18 data = [[rv(r) for _ in 1:N] for r in rngGens]

20 stats(data) = begin

21 data1, data2 = first.(data),last.(data)

22 println(round(mean(data1),digits=2), "\t",round(mean(data2),digits=2),"\t", 23 round(var(data1),digits=2), "\t", round(var(data2),digits=2), "\t", 24 round(cov(data1,data2),digits=2))

25 end 26

27 println("Mean1\tMean2\tVar1\tVar2\tCov") 28 for d in data

29 stats(d) 30 end

32 scatter(first.(data[1]), last.(data[1]), c=:blue, ms=1, msw=0, label="Normal") 33 scatter!(first.(data[2]), last.(data[2]), c=:red, ms=1, msw=0, label="Uniform") 34 scatter!(first.(data[3]), last.(data[3]), c=:green, ms=1, msw=0, label="Exponential", 35 xlims=(0,40), ylims=(0,40), legend=:bottomright, ratio=:equal,

36 xlabel=L"X_1", ylabel=L"X_2")

Mean1 Mean2 Var1 Var2 Cov 14.99 19.99 6.01 9.0 4.0 15.0 20.0 6.01 8.96 3.97 15.0 19.98 6.03 8.85 4.01

124 CHAPTER 3. PROBABILITY DISTRIBUTIONS - DRAFT

We define the covariance matrix SigY and the mean vector muY in lines 6-9. In line 10 we use cholesky()fromLinearAlgebratogether with.Lto compute a lower triangular matrixAthat satisfies (3.31). In lines 12-14 we define an array of functions, rngGens, where each element is a function that generates a scalar random variable with zero mean and unit variance. The first entry is a standard normal, the second entry is a uniform on[−√

3,√

3]and the third entry is a unit exponential shifted by−1. The function we define in line 16,rv(), assumes an input argument which is a function to generate a random value and then implements the transformation (3.29). In line 18 we create an array of 3 arrays, with each internal array consisting of N 2-dimensional random vectors. We then define a functionstats()in lines 20-25 which calculates and prints first and second order statistics.

Note the use of beginand endto define the function. The function is then used in lines 27-30 for printing output. The remainder of the code creates Figure 3.25 usingdata.

Bivariate Normal

One of the most ubiquitous families of multi-variate distributions is the multi-variate normal distribution. Similarly to the fact that a scalar (univariate) normal distribution is parametrized by the mean µ and the variance σ², a multi-variate normal distribution is parametrized by the mean vector µ_X and the covariance matrixΣ_X.

We begin first with the standard multi-variate havingµ_X =0 mean andΣ_X =I. In this case, the PDF for the random vectorX= (X₁, . . . , X_n) is,

f(x) = (2π)^−n/2e⁻¹²^x^T^x. (3.32) Listing 3.33 illustrates numerically that this is a valid PDF for increasing dimensions. The example also illustrates how to use numerical integration. The integral (3.24) is carried out. As is observed from the output, the integral is accurate for dimensionsn= 1, . . . ,8 after which accuracy is lost for the given level of computational effort specified (up to10⁷ function evaluations).

Listing 3.33: Multidimensional integration

1 using HCubature 2

3 M = 4.5 4 maxD = 10 5

6 f(x) = (2*pi)^(-length(x)/2) * exp(-(1/2)*x’x) 7

8 for n in 1:maxD 9 a = -M*ones(n) 10 b = M*ones(n)

11 I,e = hcubature(f, a, b, maxevals = 10^7)

12 println("n = $(n), integral = $(I), error (estimate) = $(e)") 13 end

n = 1, integral = 0.9999932046537506, error (estimate) = 4.365848932375016e-10 n = 2, integral = 0.9999864091389514, error (estimate) = 1.487907641465839e-8 n = 3, integral = 0.9999796140804286, error (estimate) = 1.4899542976517278e-8 n = 4, integral = 0.9999728074508313, error (estimate) = 4.4447365681340567e-7 n = 5, integral = 0.999965936103044, error (estimate) = 2.3294669134930872e-5

Dalam dokumen Fundamentals for Data Science, Machine Learning and Artificial Intelligence. (Halaman 131-135)