The treatment is based on their use in the study of point processes, discontinuous martingales, Markov processes with jumps, and especially L´evy processes. The presentation is aligned in form and style with modern treatments of probability theory and stochastic processes.
Sigma-algebras
If the intersection of every countable set of sets inCis inC, then we say that Cis is closed under countable intersections.
A collection of subsets of E is a σ-algebra if and only if it is both a p-system and a d-system on E. To show sufficiency, let E be a collection of subsets of E that is both a p-system and a d-system.
Monotone class theorem
1 Measurable spaces 3 It is clear that aσ-algebra is both a p-system and a d-system, and the converse will be shown next. Second, it is closed under unions:A, B∈E⇒A∪B∈E, because A∪B= (Ac∩Bc)c andEis closed under complements (as shown) and under intersections by the hypothesis that it is a p -system.
Measurable spaces
Products of measurable spaces
Exercises
Measurable functions
Composition of functions
2 Measurable Functions 7 The next proposition will be remembered by the expression "measurable functions of measurable functions are measurable". If f is measurable with respect to E and F, andg with respect to FandG, then ng◦f is measurable with respect to EandG.
Numerical functions
Positive and negative parts of a function
The decomposition f = f+ −f− allows us to obtain many results for arbitrary from the corresponding results for positive functions.
Indicators and simple functions
Limits of sequences of functions
The rightmost member belongs to E: for each the set fn−1[−∞, r]∈E is closed by the E-measurability of fn, andE under countable intersections.
Approximation of measurable functions
A positive function on E is E-measurable if and only if it is the limit of an increasing sequence of positive simple functions. We must show that there is a sequence (fn) of positive simple functions increasing tof.
Monotone classes of functions
Standard measurable spaces
Notation
Exercises and complements
Show that a positive function f on E is E-measurable if and only if it has the form We can think of m(x) as the mass associated with a point x, and then μ(A) is the mass on the set A.
Some properties
Arithmetic of measures
Finite, σ -finite, Σ -finite measures
Specification of measures
Atoms, purely atomic measures, diffuse measures
Completeness, negligible sets
Almost everywhere
Hint: Let (En) be a measurable partition of E, so thatμ(En)<∞for Eachn; defineμn as the trace ofμonEn as in exercise 3.11a; show thatμ= nμn.. b) Show that the Lebesgue measures Risσ-finite. At the end we will also show that 4.2 characterizes integration. a) Let f be simple and positive.
Integrability
If the Riemann integral exists, then so does the Lebesgue integral, and the two integrals are equal. The reverse is false; the Lebesgue integral exists for a larger class of functions than the Riemann integral.
Integral over a set
Positivity and monotonicity
Monotone Convergence Theorem
As (fn) increases, the integral μfn forms an increasing series of numbers by the monotonicity property shown by Proposition4.7. The following steps show that the converse inequality also holds. a) Fix b in R+ and B in E.
Linearity of integration
Insensitivity of the integral
Fatou’s lemma
Dominated convergence theorem
If lymph neck exists, then lim inffn = lim supfn= limfn, and lymph niche integrable since it is dominated by g.
Almost everywhere versions
Likewise, fn ≥0 almost everywhere means that there is Mn in N such that fn ≥ 0 outside Mn. The reader is invited to formulate the “almost everywhere version” of the dominated convergence theorem and carefully prove it once.
Characterization of the integral
The necessity of the conditions is immediate from the properties of the integral: (a) it follows from the definition of μf, (b) from linearity, and (c) from the monotonic convergence theorem. In fact, the class of measures that can be represented as Lebesgue measure images on R+ is very large.
Indefinite integrals
Radon-Nikodym theorem
If ν is an indefinite integral of some positive E-measurable function with respect to μ, then it is clear from 5.5 that ν is absolutely continuous with respect to μ. The functional question can be denoted by dν/dμ according to the equivalence of 5.5-5.9 and 5.12; and the function p is also called the Radon-Nikodym derivative of ν with respect to μ.
A matter of style
If μ is absolutely continuous with respect to the LebesgueλonR+ measure, then the cumulative distribution function is differentiable at λ-almost every R+ and. Show that, conversely, if μ is absolutely continuous with respect to a finite measureν, then μ is Σ-finite.
Measure-kernel-function
This special case will inform the choice of notations such as Kf and μK below (remember that functions are thought of as generalizations of column vectors and measures as generalizations of row vectors). To specify a kernel K from (E,E) to (F,F), it is more than enough to specify Kf for each inF+.. b) K(af+bg) =aKf+bKgforf andg inF+ andaandbin R+, c) Kfn Kf for each sequence (fn) in F+ increasing tof.
Products of kernels, Markov kernels
Kernels finite and bounded
Functions on product spaces
6 Kernels and Product Spaces 41 property (a) is immediate from the linearity of integration with respect to Kx for all x, and the continuity property (b) follows from the monotone convergence theorem for the measureKx. Since the measurable rectangles generate the σ-algebraE⊗F, it follows from the one-tone class theorem 2.19 that M includes all positive (or bounded) f in E⊗F, assuming that K is bounded.
Measures on the product space
Note that the measurable rectangles Em × Fn form a partition E × F and that by formula 6.13 for π and ˆπ,. Thus, the measures π and ˆπ are σ-finite, agree with a p-system of measurable rectangles generating E⊗F, and this p-system contains a partition E×F over which π and ˆπ are finite.
Product measures and Fubini
Assuming that μ is σ-finite and K is σ-bounded, it remains to be shown that π is σ-finite and is the only criterion that satisfies6.13. To prove that πf = ˆπfˆ it is sufficient to show that (μi×νj)f = (νj×μi) ˆf for each pair i enj.
Finite products
Infinite products
Since measurable rectangles generate the productσ-algebraF, this implies by Proposition 2.3 that f is measurable with respect to H and F. In general, in order for 6.15 to hold, it is necessary that μ and ν be Σ-finite.
Complements
A probability space is a triple (Ω,H,P) where Ω is a set, Its aσ-algebra on Ω, and P a probability measure on (Ω,H). It requires a lot of thought and experience, it is rarely explicit, and it determines the quality of the probability space as a model of the experiment in question.
Negligibility, completeness
These are all the same as before for arbitrary measurements, except for the sequential continuity under decreasing bounds, which is made possible by the finiteness of P: If H1 ⊃ H2. Hn=H, then the complements Hnc increase to Hc, implying that P(Hnc)P(Hc) by the sequential continuity of measurements under increasing limits, and we have P(H) = 1−P(Hc), and similar for EachHn, due to the finite additivity and normalization of P.
Almost surely, almost everywhere
Random variables
Distribution of a random variable
Functions of random variables
Joint distributions
Independence
An arbitrary (countable or uncountable) collection of random variables is said to be independent if every finite subset of it is independent.
Stochastic processes and probability laws
Letγa denotes the standard gamma distribution with shape indexa; this is the probability measure μ of Example 1.13 above, but with c = 1. Then the joint distribution of R+ given by.
Properties of expectation
The change in notation serves to highlight the important change in our interpretation of EX: The integral PX is the "area under the function" X in a generalized sense. The expectationEX is the "weighted average of the values" of X, the weight distribution is specified by P, the total weight is P(Ω) = 1. See Figure 2 above for the distinction.
Expectations and integrals
The inverse statement is useful for figuring out the distribution of X in cases where X is a known function of other random variables whose joint distribution is known. For a measure μ to be the distribution of
Means, variances, Laplace and Fourier transforms
Show that their sum of squares has the range distribution with shape index n/2 and scale 1/2. LeXandY to be independent, with gamma distribution γa,c (with shape index a and scale parameter c), and Y to have the standard Gaussian distribution. bY has Gaussian distribution with mean 0 and variance b > 0.
Inequalities
For each pin [1,∞], let Lp denote the collection of all real-valued random variables X with Xp<∞. For a point [1,∞), X is in Lp if and only if |X|p is integrable; and X is in L∞ if and only if X is almost reliably bounded.
Uniform integrability
To see this, note that E|X| ≤b+k(b) for allX and use the uniform integrability of K to choose a finite number such that kk(b)≤1. d) But the limit L1 is insufficient for uniform integrability. In particular, as noted earlier, it shows that the limit of Lp for some p >1 implies uniform integrability.
Sigma-algebras generated by random variables
This section is onσ algebras generated by random variables and measurability with respect to them. We will also argue that such an aσ-algebra should be thought of as a collection of information, and measurability with respect to it should be equated with being determined by that information.
Measurability
Conversely, if X is a random variable that takes values in the product space (E,E), we denote by the resulting map ω → Xt(ω) is a random variable with values in (Et,Et) and is called the t-coordinate of X. For each n in N∗, let Xn be a random variable that takes values in a measurable space (En, En).
Heuristics
Of course, the basic theorem of this section, Theorem 4.4, is embedded in the heuristic, which now becomes obvious: if information G consists of knowledge X, then G specifies exactly those variables V that are deterministic functions of X. Another result that becomes obvious is Proposition 4.3: in his setting, since knowing X is the same as knowing Xt for everything in T, the information generated by X is the same as the information generated by Xt, t∈T.
Filtrations
4 Information and Definability 79 In this case, informationG is generated by {X1, X2, X3} in the sense that knowing X1, X2, X3 is equivalent to knowing informationG. Recall that σX is the algebra σ in Ω generated by X, and X here can be a random variable or a collection of random variables.
Definitions
For random variables, the concept reduces to the earlier definition: they are independent if and only if their joint distribution is the product of their marginal distributions. As usual, if G is a sub-σ-algebra of H, we consider it both as a set of events and as the set of all numerical random variables measurable with respect to it.
Independence of σ -algebras
Independence of collections
Pairwise independence
Independence of random variables
Xn are independent if and only if their joint distribution is the product of their marginal distributions.
Sums of independent random variables
Kolmogorov’s 0-1 law
As a consequence, assuming that theGn is independent, for every random variable V in the tail-σ-algebra there is a constant c in ¯R such that V = c almost certainly. In the same example, the following statement will imply that the events {lim supSn > b} and {Sn ∈B i.o.} have probability 0 or 1, even if they are not in the tailT, provided that with the independence of Xn we add the extra condition that they have the same distribution.
Hewitt-Savage 0-1 law
From the four combinations, rejecting the impossible case when lim infSn= +∞ and lim supSn=−∞, we arrive at the result. X1 have the same distribution (as Gaussian with mean 0), then (Sn) and (−Sn) have the same law, and it follows that cases (i) and (ii) are impossible.
Complements: Bernoulli sequences
5 Independence 89 the same is true with Laplace transforms. When X and Y are positive integers, the same is true with generating functions. If these two numbers are equal to the same numberx, then (xn) is said to have the limit x, and we write limxn =xorxn→x to indicate this.
Characterization
The goal here is to review the concept of convergence inR and to gather some useful results from analysis. The sequence is said to be convergent in R, or simply convergent if the limit exists and is a real number.
Cauchy criterion
Subsequences, selection principle
If every subsequence that has a limit has the same value x for the limit, then the sequence tends to samex (infinite values are allowed for x). If the sequence is bounded and every convergent subsequence of it has the same limitx, then the sequence converges to x.
Diagonal method
Obviously, (xn) tends to a limit x (infinite values are allowed for the limit) if and only if every sequence of it has the same limitx. The next statement, called the selection principle, is immediate from the observation that for every sequence (xn) there is a subsequence whose limit is lim infxn.
Helly’s Theorem
Kronecker’s Lemma
The sequence (Xn) is almost surely convergent if and only if Ω0 is almost certain, that is, P(Ω0) = 1. Moreover, if Ω0 is almost certain, then letting X(ω) = limXn(ω ) for ω in Ω0 andX(ω) = 0 for ω /∈Ω0, we get a real-valued random variable X such thatXn→X almost certainly.
Characterization theorem
A sequence (Xn) is said to be almost certainly convergent if the numerical sequence (Xn(ω)) is convergent for almost every ω; it is said to converge to X if X is almost surely a real random variable and limXn(ω) =X(ω) for almost every ω. Of course, if X is another random variable such that X = X is almost certain, then Xn→X is also almost certain.
Borel-Cantelli lemmas
By the Borel-Cantelli lemma, the assumption implies that the condition of Proposition 1.5 holds for the sequence (xn) = (Xn(ω)) for almost everyω.
Borel-Cantelli: divergence part
The sequence (Xn) is almost certainly convergent if and only iflimm,n→∞|Xn−Xm|= 0 almost certainly. The following are equivalent: (Xn) is almost certainly convergent; (Yn) almost certainly converges to 0; (Zn) almost certainly converges to 0.
Convergence in metric spaces
If it converges to X in probability, then it has a subsequence that almost certainly converges to X. Theorem 2.7 therefore applies to the subseries (Xnk) to conclude that it almost certainly converges to 0. c) Assume that every subsequence of (Xn) has a further subsequence that almost certainly converges to 0.
Convergence and continuous functions
Then iε◦Xn →0 is almost certain, which implies via the bounded convergence theorem that pn →0.. b) Suppose that Xn → 0 in probability. Since Xn →X in probability implies along N, Theorem 3.3b implies that N has a subsequence N along which Xn →
Convergence and arithmetic operations
Metric for convergence in probability
Cauchy Criterion
If (Xn) diverges to +∞ and (Yn) converges to Y, both in probability, then (Xn+Yn) diverges to +∞in probability. Also, if the sequence converges to X inLp, then it converges to the same X in probability: By Markov's inequality, for everyε >0,.
Convergence, Cauchy, uniform integrability
To show that the sequence is uniformly integrable, we use the ε-δ characterization of Theorem II.3.14: Fixε >0. Since P(Hn)→0 by the supposed convergence in probability, this completes the proof that the sequence.
A variation on the main results
The sequence(Xn) is said to converge in distributiontoX if (μn) converges weakly to μ, that is, if .. a) Convergence in probability (or in L1, or almost certainly) implies convergence in distribution. Convergence in distribution is simply a convenient turn of phrase for the poor convergence of the corresponding probability measures.
Uniqueness of weak limits and equality of measures
Convergence of quantiles and distribution functions
Almost sure representations of weak convergence
It is worth noting the absence here of the third statement in 4.9, the one about the convergence of (Xn) to X in L1. This is because convergence inL1 concerns the order of joint distributionsπn of the pairs (Xn, X), and we have no guarantee that the joint distribution of Yn and Y for each isπn.
Convergence of image measures
Such representations elevate the convergence of the distribution to the level of near-certain convergence in situations where the desired results concern only the μn and μ distributions.
Tightness and Prohorov’s Theorem
Corresponding to the distribution functions is a measure μ, which we will present as a probability measure.
Convergence of Fourier transforms
Thus, for every ε > 0 there is b > 0 such that the right-hand side is less than ε/2, and so. In other words, every subsequence of (μn) has a further subsequence that converges weakly to the same probability measure μ(whose Fourier transform isf).
Convergence of characteristic functions
Assuming that ESn =npn→c, show that the distribution of Sn converges weakly to a Poisson distribution with mean. Assume that the Xn are pairwise independent and identically distributed with finite mean a and finite variance b.
Strong law of large numbers
So it is sufficient to prove the statement under the further assumption that Xn and X are positive. Then the previous proposition gives the proof ifEX =∞. Therefore, for the remainder of the proof we assume that 3≤X <∞andEX <∞. i).
Weak law of large numbers
For each discontinuity pointxforc there is an almost certain event Ωx such that cn(ω, x)−cn(ω, x−)→c(x)−c(x−) for each ωi Ωx; this is with 6.15 applied with A={x}. Thus, if ω belongs to the almost certain event Ω0∪Ω0, part (a) applies to show that cn(ω, x)→c(x) uniformly inx.
Inequalities for maxima
All the results below are for the case where Xn is independent, in which case Kolmogorov's 0-1 law holds and the convergence of the series has probability 0 or 1, the better case being our goal. Assume that Xn are independent, have zero mean, and are dominated by a constantb.
Convergence of series and variances
Let (Yn) be independent of (Xn) and have the same law. Xn−an) is almost certainly convergent. Yn−an) converges almost certainly and the sequence (Xn−Yn)n≥1 is bounded and E(Xn −Yn) = 0 for alln.
Kolmogorov’s three series theorem
The latter implies that Xn converges almost certainly, because the convergence of the first series shows, via Borel-Cantelli 2.5, that for almost every ω the numbers Xn(ω) and Yn(ω) differ by at most finite amounts. 1{Xn =Yn}<∞ almost certainly, and the independence of Xn implies that the events {Xn = Yn} are independent.
Application to strong laws
The independence of the Xn implies the independence of the Yn. The convergence of the third series implies that, via Theorem 7.5. From the divergence part of the Borel-Cantelli lemma, Theorem 2.9b, it follows that the first series must converge in 7.10.
Triangular arrays
Assuming that a = 0 and b = 1, which is without loss of generality, the essential idea is the following: For larges the variable Zn =Sn/√. For Eachn it is always assumed that the variables on the nth row are independent.
Liapunov’s Theorem
Returning to Liapunov's theorem, we note that the normalization hypotheses about the means and variances are harmless. Assume that for each nand j there is a constant cnj such that |Xnj| ≤cnj and that limnsupjcnj = 0.
Lindeberg’s Theorem
In addition to ensuring the distribution Zn→Zin, the Lindeberg condition implies that Xnj are uniformly small compared to Zn for large, that is.
Feller-L´ evy theorem
Convergence to Poisson distribution
The latter is equivalent to proving that Laplace transforms converge properly; it means we have to show it. Thus, the contribution of Xnj to the sum of Zn is 0 or 1, which means that Xnj are almost Bernoulli variables.
Convergence to infinitely divisible variables
In Lemma 8.27 below, we will show that there exists a subsequence K such that (S1,k) converges in distribution along K to some random variable Y1. Then, since Sjk has the same distribution as S1,k, the sequence (Sjk) converges in distribution along the same K to some variable Yj for j= 1,.
Preparatory steps
As usual, we treat F as the collection of events and as the collection of all F-measurable random variables. Similarly, for a sub-σ-algebra F, we treat F as the collection of all F-measurable random variables and F+ as the subset of positive ones.
Definition of conditional expectations
For the same reason, EFX should be considered as a multi-purpose notation for every version of ¯X. Some authors take an additional logical step and define .. conditional expectation . . ” to be the equivalent class of all variants, then use EFX as a representative of that class and write “EFX = ¯X almost certainly”, which means that ¯X is a variant. e) Integrability. For this reason, we can call ¯X the orthogonal projection of X onto F, and we call the defining property of 1.4b the “projection property” to suggest this picture.
Existence of conditional expectations
This uniqueness to equivalence extends to EFX for arbitraryX for which EX exists; see also (f) below. d) Language. Given uniqueness to equivalence, the article defined in “conditional expectation . . ” is just a slight abuse of language.
Properties similar to expectations
On the measurable space (Ω,F), then P is a probability measure, and Q is a measure that is absolutely continuous with respect to P. Then each X¯n is a version of EFXn, and ( ¯Xn) is an increasing row inF+; let ¯X be the limit.
Special properties
Conditioning as projection
Conditional expectations given random variables
Finally, the notation E(X|Y =y) is used and reads "the conditional expectation of X given that Y =y" despite its annoying ambiguity. It is also used when P{Y =y} = 0 for ally, and the correct interpretation is then that it is a notation for f(y) when f◦Y =EσYX.
Regular versions
For example, ifFi is generated by the partition (Gn) of Ω, which is the case if F is generated by a random variable having values in a countable space, then.
Conditional distributions
Thus, to show that D=Evia the monotone class theorem, it suffices to show that [−∞, t]∈D for every R. To show that it is the conditional distribution of Y given F, it remains to show the projection property for 2.8 , that is, we must show it.
Disintegrations
The preceding part applies to the real random variable g◦Y and shows the existence of the conditional distribution ˆL : (ω, B)→ Lˆω(B) from (Ω,F) to (ˆE,Eˆ) forg◦Y givenF. Since Y takes values in the standard measurable space (E,E), by Theorem 2.10, there is a regular versionLo of the conditional distribution of Y given F=σX.
Conditional densities
As an illustration of the calculations discussed above, we now derive the conditional distribution of Y given X =Y +Z. To repeat, given sumX=Y+Z, the conditional distribution of YB→K(X, B) is the Gaussian distribution with mean 12X and variance 1/2.
Conditional independence of random variables etc
The definition of conditional independence and the previous theorem are stated in terms of positive random variables. In particular, they give the existence and construction of the Lebesgue measure on (0,1), and hence the existence of all measures ever discussed.
Construction of chains: description of data and goal
Now think about the conditions on P: it must assign a probability to each event and do so so that the countable additivity condition is satisfied for every disjoint sequence of events. The proofs are not illuminating, but the constructions leading to the theorem clarify many of the concepts discussed earlier.
Construction and analysis
After all, with respect to each random variable there are at least as many events as there are points in R, and there are infinitely many random variables, and H must include all those events and their complements and all unions and their countable intersections. Of course, X touches the values in (En,En); for each experiment result, Xn(ω) is the result of the subsequent trial. Xn) indicates the result of the tests up to and including it; is a random variable that takes values in (Fno,Fon) defined by 4.1.
Ionescu-Tulcea’s theorem
It follows that the map PofHointo [0,1] is finitely additive: ifGandH is in Ho and disjoint, then there is n such that they both belong to Fn, and therefore P(G∪H) =P(G) +P (H). Suppose for the time being that limkP(Hk)>0; we will show that this leads to a contradiction.
Initial distribution
Kolmogorov extension theorem
Note that ˆπn has representation 4.4: this is trivial for n = 0 and, assuming it holds for n, it follows from the disintegration theorem 2.18 with (D,D) = (E,E)Jn that it holds for n+ 1 also (the standardity (E,E) is used here). Thus, the Ionescu-Tulcea theorem is used to show that there exists a probability measure PJ on (E,E)J such that the image of PJ under pJ is Jn ˆπn for every n.
Independent sequences
Jn all the same by replacing Jn with J =∪Jn; and then,H ={XJ∈A} with A = ∪An, the An is disjoint. This section is devoted to special cases of the probability spaces constructed in the preceding section, as well as certain alternatives to such constructions.
Markov chains
Such chains can be made homogeneous in time by including time in the state space: Let Eˆ =N×E,Eˆ= 2N⊗E, and define the Markov kernel ˆP on ( ˆE,Eˆ) such that. Then, ˆXn = (n modulo d, Xn), n ∈ N, form a time-homogeneous Markov chain with state space ( ˆE,Eˆ ) and transition kernel ˆP.
Markov processes, continuous time
For the Markov chains introduced so far, the conditional distribution Xn+1 with respect to the entire past F depended only on the last state of Xn. In some applications, for example if Xn is to denote the weather on a day, it is desirable that the dependence on the past be somewhat deeper: for a fixed integer k≥2, assume that.
Random fields
This chapter aims to introduce the vocabulary for describing the evolution of random systems over time. Given a stochastic processX = (Xt)t∈T, allowingFt=σ{Xs:s≤t}for each times, we obtain a filteringF= (Ft)t∈T; is called X-generated filtering.
Adaptedness
Stopping times
1 Filtrations and stopping times 173 alarm system that sounds exactly at time T, and only at time T, then T is a stopping time of F. Heuristically speaking, Tk is a dwell time because it is possible to build an alarm system that sounds exactly at the time of arrival.
Conventions for the end of time
That is, V∈ FT if and only ifXt∈Ftfor every ¯T, which is the claim of the first statement. Indeed, the last paragraph of the previous proof shows what is required of Y: For each, the mapping (s, ω)→Ys(ω) from Bt×Ω to ¯R must be Bt⊗Ft measurable.
Comparing different pasts
1 Filtering and Stop Time 177 S must be less than that accumulated by T. The following shows this and gives further comparisons for generalS and T. 1.16 Theorem. Let S and T be the stopping times of F. Then, a) S∧T and S∨T are stopping times of F; .. i) Since SandT are stopping times, the events{S≤t}and {T ≤t}are inFtfor each time.
Times foretold
Approximation by discrete stopping times
Conditioning at stopping times
With the exception of the claim about repeated conditioning, these are all merely restatements of the definition of conditional expectations and Theorem IV.1.10. Show that an arbitrary time T is a stopping time of F if and only if, for any pair of outcomesω andω,.
Uniformly integrable martingales
In the further case where theRn is positive, the martingaleM is considered a reasonable model for the evolution of the price of a share of stock. Then Mn stands for the price of a stock at time, and Rn + 1 is interpreted as the return at time + 1 per dollar invested at time in that stock.