Lecture-02: Probability Review

(1)

Lecture-02: Probability Review

1 Probability Review

Definition 1.1 (σ-algebra). A collectionAof subsets of a setΩis called aσ-algebra if (i) it contains the empty set, (ii) it is closed under complements, and (iii) it is closed under countable unions.

Definition 1.2. Aprobability space(_Ω,F,_P)consists of

(i) set of all possible outcomes called asample spacedenoted byΩ, (ii) aσ-algebra over sample space calledevent spacedenoted byF, and

(iii) a non-negative set functionprobabilitydenoted byP:F→[0, 1], such thatP(_Ω) =1, and is addi- tive for countably disjoint events.

An element of the sample space is called an outcome and an element of event space is called an event.

Definition 1.3. A collection of eventsE⊆Fis called a sub-event space if it is aσ-algebra overΩ.

Definition 1.4. For a family of events A ⊆F, the sub-event space generated by the family A is the smallest sub-event space containing the familyAand denoted byσ(A).

Remark1. The sub-event spaceσ(A)contains all the elements ofAand the complements and countable unions of generated sets.

Example 1.5 (Discreteσ-algebra). For a finite sample space Ω, the event spaceF={A:A⊆_Ω} consists of all subsets of sample spaceΩ, and is denoted by 2^ΩorP(_Ω).

Example 1.6 (Borelσ-algebra). If the sample spaceΩ=_{R, then a}Borelσ-algebra is generated by half-open intervals by complements and countable unions. That is,B(_R)≜σ({(−_∞,x]:x∈_R}). We make the following observations.

1. From closure under complements, the open interval(_x,∞)belong toB(_R)for eachx∈_R.

2. From closure under countable unions, the open interval(−∞,x) =∪n∈N(−∞,x− ¹_n]_belongs toB(_R)for eachx∈_R.

3. From closure under complements, half-closed intervals[x,∞)belongs toB(_R)for eachx∈_R.

4. From closure under finite intersections, the closed set[x,y] = [x,∞)∩(−_∞,y]belongs toB(_R) for eachx,y∈_R.

5. From closure under countable intersections, the singleton{x}=^T_n∈_N([x−¹_n,x+¹_n])_belongs toB(_R)_{for each}x∈R.

1.1 Limits of sets and continuity of probability

There is a natural order of inclusion on sets through which we can define monotonicity of probability set functionP. To define continuity of this set function, we define limits of sets.

1

(2)

Definition 1.7. For a sequence of sets(An:n∈_N), we definelimit superiorandlimit inferiorof this set sequence respectively as

lim sup

n

An= ^\

n∈N [

k⩾n

A_k, lim inf

n An= ^[

n∈N

\

k⩾n

A_k.

It is easy to check that lim infAn⊆lim supAn. We say that limit of set sequence exists if lim supAn⊆ lim infAn, and the limit of the set sequence in this case is lim supAn.

Theorem 1.8. Probability set function is monotone and continuous.

Proof. Consider two events A⊆B both elements ofF, then from the additivity of probability over disjoint eventsAandB\A, we have

P(B) =P(A∪B\A) =P(A) +P(B\A)⩾P(A)_.

Monotonicity follows from non-negativity of probability set function, that is sinceP(B\A)>0. For continuity from below, we take an increasing sequence of sets(An:n∈_N), such thatAn ⊆An+1for alln. We observe thatA_∞≜limnAn=∪_n∈_NAnandAn↑A_∞. We can define disjoint sets(En:n∈_N), where

E1=A1, En=An\An−1,n⩾2.

The disjoint setsEn’s satisfy∪ⁿ_i=1E_i=Anfor alln∈_Nand∪_nEn=∪_nAn. From the above property and the additivity of probability set function over disjoint sets, it follows that

P(A_∞) =P(∪_nEn) =

∑

n∈N

P(En) =lim

n∈N

∑

n i=1

P(Ei) =lim

n∈NP(∪ⁿ_i=1Ei) =lim

n∈NP(An).

For continuity from below, we take decreasing sequence of sets (An :n∈N), such that An+1⊆ An

for alln. We can form increasing sequence of sets(Bn:n∈N)whereBn=A^c_n. Then, the continuity from below follows from the continuity from above. Continuity of probability for general sequence of converging sets follows from the definition of limsup and liminf of sequence of sets and the continuity of probability function from above and below.

1.2 Independence

Definition 1.9. For a probability space(_Ω,F,P), two eventsA,B∈_Fareindependent eventsif P(A∩B) =P(A)P(B).

Definition 1.10. Two sub-event spacesGandHare called independent if any pair of events(G,H)∈ G×Hare independent. That is,

P(G∩H) =P(G)P(H),G∈_G,H∈_H.

1.3 Conditional Probability

Definition 1.11. Let(_Ω,_F,P)be a probability space. For events A,B∈_Fsuch that 1>P(B)>0, the conditional probability of eventAgiven eventBis defined as

P(A|B) = ^P(A∩B) P(B) ^.

2 Random variables

Definition 2.1. A real valuedrandom variableXon a probability space(Ω,F,P)is a functionX:Ω→R such that for everyx∈R, we have

X⁻¹(−∞,x]≜{ω∈Ω:X(ω)⩽x} ∈F.

Recall that the collection{(−_∞,x]:x∈_R}generates the Borelσ-algebraB(_R). Therefore, it follows thatX⁻¹(_B(_R))⊆F, since set inverse mapX⁻¹preserves complements, unions, and intersections.

2

(3)

Definition 2.2. For a random variableXdefined on the probability space(_Ω,_F,P), we defineσ(X)is the smallestσ-algebra formed by inverse mapping of Borel sets, i.e.

σ(X)≜σ(ⁿX⁻¹(−_∞,x]:x∈_R^o).

Remark2. We note thatσ(X)is a sub-event space ofFand hence probability is defined for each element ofσ(X).

Definition 2.3. For a random variableXdefined on probability space(Ω,F,P), the correspondingdis- tribution functionF:R→[_{0, 1}]is defined as

F(x)≜(P◦X⁻¹)(−_∞,x], for allx∈_R.

Theorem 2.4. Distribution function F of a random variable X:Ω→_Ris non-negative, monotone increasing, continuous from the right, and has countable points of discontinuities. Further, if P◦X⁻¹(R) =1, then

x→−lim∞F(x) =0, lim

x→∞F(x) =1.

Proof. Non-negativity and monotonicity of distribution function follows from non-negativity and monotonicity of probability set function, and the fact that forx₁<x2

X⁻¹(−∞,x1]⊆X⁻¹(−∞,x2].

Letxn↓x_∞be a decreasing sequence of real numbers. We take decreasing sets A∈F^N, where An≜ X⁻¹(−_∞,x_n]∈_Fforn∈_{N. Then} An↓A_∞, and the right continuity of distribution function follows from the continuity from above of probability set functions. Countable discontinuities follow from the fact that limx→∞F_X(x)⩽1. By taking sequences an↓ −_∞and bn↑∞, we define sequence of monotone sets An≜X⁻¹(−_∞,an]↓_∅andBn≜X⁻¹(−_∞,bn]↑Ω. The result follow from the continuity of probability function.

Example 2.5. One of the simplest family of random variables are indicator functions1:F×_Ω→ {0, 1}. For each eventA∈F, we can define an indicator function as

1A(ω) =

(1, ω∈A, 0, ω∈/A.

We make the following observations.

1. 1Ais a random variable for eachA∈F. This follows from the fact that

1⁻¹_A (−_∞,x] =







∅, x<0, A^c, x∈[0, 1), Ω, x⩾1.

2. The distribution functionFfor the random variable1Ais given by

F(x) =







0, x<0, P(A^c), x∈[0, 1),

1, x⩾1.

Definition 2.6. A random variableXisindependentof an event subspaceE, ifσ(X)_and_E_{are inde-} pendent event subspaces.

Remark3. Sinceσ(X)is generated by the collection(X⁻¹(−_∞,x]:x∈_R), it follows thatXis independent ofEif and any if for allx∈_Rand eventE∈_E,

E[₁_{X⩽x}₁_E] =P({X⩽x} ∩E) =P({X⩽x})P(E) =_E1_{X⩽x}_E1_E. 3

(4)

2.1 Expectation

Letg:R→_Rbe a Borel measurable function, i.e. g⁻¹(−_∞,x]∈_B(_R)for allx∈R. Then, theexpectation ofg(X)for a random variableXwith distribution functionFis defined as

Eg(X) = Z

x∈Rg(x)dF(x).

Remark4. Recall that probabilities are defined only for events. For a random variableX, the probabilities are defined for generating eventsX⁻¹(−∞,x]∈FbyF(x) =P◦X⁻¹(−∞,x].

Remark5. The expectation is only defined for random variables. For an eventA, the probabilityP(A) equals expectation of the indicator random variable1A.

3 Random Vectors

Definition 3.1. IfX₁, . . . ,Xnare random variables defined on the same probability space(_Ω,F,P), then the vectorX≜(X₁, . . . ,Xn)is a random mappingΩ→_Rⁿand is called a random vector. Since eachX_i is a random variable, the joint event∩_i∈[n]X_i⁻¹(−_∞,xi]∈F, and the joint distribution of random vector Xis defined as

FX(x1, . . . ,xn) =P

∩_i∈[n]X_i⁻¹(−∞,xi], for allx∈Rⁿ.

Definition 3.2. Two random variables X,Yare independent if σ(X)andσ(Y)are independent event subspaces.

Remark6. Sinceσ(X)_andσ(Y)are generated by collections(X⁻¹(−∞,x]_:x∈R)_and(Y⁻¹(−∞,y]_:y∈ R), it follows that the random variablesX,Yare independent if and only if for allx,y∈R, we have

F_X,Y(x,y) =F_X(x)F_Y(y).

Definition 3.3. A random vector X:Ω→_Rⁿ is independent if the joint distribution is product of marginals. That is,

FX(x) =

∏

n i=1

FX_i(xi), for allx∈_Rⁿ.

4