Graduate Texts in Mathematics 261

The treatment is based on their use in the study of point processes, discontinuous martingales, Markov processes with jumps, and especially L´evy processes. The presentation is aligned in form and style with modern treatments of probability theory and stochastic processes.

Sigma-algebras

If the intersection of every countable set of sets inCis inC, then we say that Cis is closed under countable intersections.

A collection of subsets of E is a σ-algebra if and only if it is both a p-system and a d-system on E. To show sufficiency, let E be a collection of subsets of E that is both a p-system and a d-system.

Monotone class theorem

1 Measurable spaces 3 It is clear that aσ-algebra is both a p-system and a d-system, and the converse will be shown next. Second, it is closed under unions:A, B∈E⇒A∪B∈E, because A∪B= (Ac∩Bc)c andEis closed under complements (as shown) and under intersections by the hypothesis that it is a p -system.

Measurable spaces

Products of measurable spaces

Exercises

Measurable functions

Composition of functions

2 Measurable Functions 7 The next proposition will be remembered by the expression "measurable functions of measurable functions are measurable". If f is measurable with respect to E and F, andg with respect to FandG, then ng◦f is measurable with respect to EandG.

Numerical functions

Positive and negative parts of a function

The decomposition f = f+ −f− allows us to obtain many results for arbitrary from the corresponding results for positive functions.

Indicators and simple functions

Limits of sequences of functions

The rightmost member belongs to E: for each the set fn−1[−∞, r]∈E is closed by the E-measurability of fn, andE under countable intersections.

Approximation of measurable functions

A positive function on E is E-measurable if and only if it is the limit of an increasing sequence of positive simple functions. We must show that there is a sequence (fn) of positive simple functions increasing tof.

Monotone classes of functions

Standard measurable spaces

Notation

Exercises and complements

Show that a positive function f on E is E-measurable if and only if it has the form We can think of m(x) as the mass associated with a point x, and then μ(A) is the mass on the set A.

Some properties

Arithmetic of measures

Finite, σ -ﬁnite, Σ -ﬁnite measures

Speciﬁcation of measures

Atoms, purely atomic measures, diﬀuse measures

Completeness, negligible sets

Almost everywhere

Hint: Let (En) be a measurable partition of E, so thatμ(En)<∞for Eachn; defineμn as the trace ofμonEn as in exercise 3.11a; show thatμ= nμn.. b) Show that the Lebesgue measures Risσ-finite. At the end we will also show that 4.2 characterizes integration. a) Let f be simple and positive.

Integrability

If the Riemann integral exists, then so does the Lebesgue integral, and the two integrals are equal. The reverse is false; the Lebesgue integral exists for a larger class of functions than the Riemann integral.

Integral over a set

Positivity and monotonicity

Monotone Convergence Theorem

As (fn) increases, the integral μfn forms an increasing series of numbers by the monotonicity property shown by Proposition4.7. The following steps show that the converse inequality also holds. a) Fix b in R+ and B in E.

Linearity of integration

Insensitivity of the integral

Fatou’s lemma

Dominated convergence theorem

If lymph neck exists, then lim inffn = lim supfn= limfn, and lymph niche integrable since it is dominated by g.

Almost everywhere versions

Likewise, fn ≥0 almost everywhere means that there is Mn in N such that fn ≥ 0 outside Mn. The reader is invited to formulate the “almost everywhere version” of the dominated convergence theorem and carefully prove it once.

Characterization of the integral

The necessity of the conditions is immediate from the properties of the integral: (a) it follows from the definition of μf, (b) from linearity, and (c) from the monotonic convergence theorem. In fact, the class of measures that can be represented as Lebesgue measure images on R+ is very large.

Indeﬁnite integrals

Radon-Nikodym theorem

If ν is an indefinite integral of some positive E-measurable function with respect to μ, then it is clear from 5.5 that ν is absolutely continuous with respect to μ. The functional question can be denoted by dν/dμ according to the equivalence of 5.5-5.9 and 5.12; and the function p is also called the Radon-Nikodym derivative of ν with respect to μ.

A matter of style

If μ is absolutely continuous with respect to the LebesgueλonR+ measure, then the cumulative distribution function is differentiable at λ-almost every R+ and. Show that, conversely, if μ is absolutely continuous with respect to a finite measureν, then μ is Σ-finite.

Figure 1: Both c and a are increasing right-continuous. They are functional inverses of each other.

Measure-kernel-function

This special case will inform the choice of notations such as Kf and μK below (remember that functions are thought of as generalizations of column vectors and measures as generalizations of row vectors). To specify a kernel K from (E,E) to (F,F), it is more than enough to specify Kf for each inF+.. b) K(af+bg) =aKf+bKgforf andg inF+ andaandbin R+, c) Kfn Kf for each sequence (fn) in F+ increasing tof.

Products of kernels, Markov kernels

Kernels ﬁnite and bounded

Functions on product spaces

6 Kernels and Product Spaces 41 property (a) is immediate from the linearity of integration with respect to Kx for all x, and the continuity property (b) follows from the monotone convergence theorem for the measureKx. Since the measurable rectangles generate the σ-algebraE⊗F, it follows from the one-tone class theorem 2.19 that M includes all positive (or bounded) f in E⊗F, assuming that K is bounded.

Measures on the product space

Note that the measurable rectangles Em × Fn form a partition E × F and that by formula 6.13 for π and ˆπ,. Thus, the measures π and ˆπ are σ-ﬁnite, agree with a p-system of measurable rectangles generating E⊗F, and this p-system contains a partition E×F over which π and ˆπ are ﬁnite.

Product measures and Fubini

Assuming that μ is σ-finite and K is σ-bounded, it remains to be shown that π is σ-finite and is the only criterion that satisfies6.13. To prove that πf = ˆπfˆ it is sufficient to show that (μi×νj)f = (νj×μi) ˆf for each pair i enj.

Finite products

Inﬁnite products

Since measurable rectangles generate the productσ-algebraF, this implies by Proposition 2.3 that f is measurable with respect to H and F. In general, in order for 6.15 to hold, it is necessary that μ and ν be Σ-finite.

Complements

A probability space is a triple (Ω,H,P) where Ω is a set, Its aσ-algebra on Ω, and P a probability measure on (Ω,H). It requires a lot of thought and experience, it is rarely explicit, and it determines the quality of the probability space as a model of the experiment in question.

Negligibility, completeness

These are all the same as before for arbitrary measurements, except for the sequential continuity under decreasing bounds, which is made possible by the finiteness of P: If H1 ⊃ H2. Hn=H, then the complements Hnc increase to Hc, implying that P(Hnc)P(Hc) by the sequential continuity of measurements under increasing limits, and we have P(H) = 1−P(Hc), and similar for EachHn, due to the finite additivity and normalization of P.

Almost surely, almost everywhere

Random variables

Distribution of a random variable

Functions of random variables

Joint distributions

Independence

An arbitrary (countable or uncountable) collection of random variables is said to be independent if every finite subset of it is independent.

Stochastic processes and probability laws

Letγa denotes the standard gamma distribution with shape indexa; this is the probability measure μ of Example 1.13 above, but with c = 1. Then the joint distribution of R+ given by.

Properties of expectation

The change in notation serves to highlight the important change in our interpretation of EX: The integral PX is the "area under the function" X in a generalized sense. The expectationEX is the "weighted average of the values" of X, the weight distribution is specified by P, the total weight is P(Ω) = 1. See Figure 2 above for the distinction.

Expectations and integrals

The inverse statement is useful for figuring out the distribution of X in cases where X is a known function of other random variables whose joint distribution is known. For a measure μ to be the distribution of

Means, variances, Laplace and Fourier transforms

Show that their sum of squares has the range distribution with shape index n/2 and scale 1/2. LeXandY to be independent, with gamma distribution γa,c (with shape index a and scale parameter c), and Y to have the standard Gaussian distribution. bY has Gaussian distribution with mean 0 and variance b > 0.

Inequalities

For each pin [1,∞], let Lp denote the collection of all real-valued random variables X with Xp<∞. For a point [1,∞), X is in Lp if and only if |X|p is integrable; and X is in L∞ if and only if X is almost reliably bounded.

Uniform integrability

To see this, note that E|X| ≤b+k(b) for allX and use the uniform integrability of K to choose a finite number such that kk(b)≤1. d) But the limit L1 is insufficient for uniform integrability. In particular, as noted earlier, it shows that the limit of Lp for some p >1 implies uniform integrability.

Sigma-algebras generated by random variables

This section is onσ algebras generated by random variables and measurability with respect to them. We will also argue that such an aσ-algebra should be thought of as a collection of information, and measurability with respect to it should be equated with being determined by that information.

Measurability

Conversely, if X is a random variable that takes values in the product space (E,E), we denote by the resulting map ω → Xt(ω) is a random variable with values in (Et,Et) and is called the t-coordinate of X. For each n in N∗, let Xn be a random variable that takes values in a measurable space (En, En).

Heuristics

Of course, the basic theorem of this section, Theorem 4.4, is embedded in the heuristic, which now becomes obvious: if information G consists of knowledge X, then G specifies exactly those variables V that are deterministic functions of X. Another result that becomes obvious is Proposition 4.3: in his setting, since knowing X is the same as knowing Xt for everything in T, the information generated by X is the same as the information generated by Xt, t∈T.

Filtrations

4 Information and Definability 79 In this case, informationG is generated by {X1, X2, X3} in the sense that knowing X1, X2, X3 is equivalent to knowing informationG. Recall that σX is the algebra σ in Ω generated by X, and X here can be a random variable or a collection of random variables.

Definitions

For random variables, the concept reduces to the earlier definition: they are independent if and only if their joint distribution is the product of their marginal distributions. As usual, if G is a sub-σ-algebra of H, we consider it both as a set of events and as the set of all numerical random variables measurable with respect to it.

Independence of σ -algebras

Independence of collections

Pairwise independence

Independence of random variables

Xn are independent if and only if their joint distribution is the product of their marginal distributions.

Sums of independent random variables

Kolmogorov’s 0-1 law

As a consequence, assuming that theGn is independent, for every random variable V in the tail-σ-algebra there is a constant c in ¯R such that V = c almost certainly. In the same example, the following statement will imply that the events {lim supSn > b} and {Sn ∈B i.o.} have probability 0 or 1, even if they are not in the tailT, provided that with the independence of Xn we add the extra condition that they have the same distribution.

Hewitt-Savage 0-1 law

From the four combinations, rejecting the impossible case when lim infSn= +∞ and lim supSn=−∞, we arrive at the result. X1 have the same distribution (as Gaussian with mean 0), then (Sn) and (−Sn) have the same law, and it follows that cases (i) and (ii) are impossible.

Complements: Bernoulli sequences

5 Independence 89 the same is true with Laplace transforms. When X and Y are positive integers, the same is true with generating functions. If these two numbers are equal to the same numberx, then (xn) is said to have the limit x, and we write limxn =xorxn→x to indicate this.

Characterization

The goal here is to review the concept of convergence inR and to gather some useful results from analysis. The sequence is said to be convergent in R, or simply convergent if the limit exists and is a real number.

Cauchy criterion

Subsequences, selection principle

If every subsequence that has a limit has the same value x for the limit, then the sequence tends to samex (infinite values are allowed for x). If the sequence is bounded and every convergent subsequence of it has the same limitx, then the sequence converges to x.

Diagonal method

Obviously, (xn) tends to a limit x (infinite values are allowed for the limit) if and only if every sequence of it has the same limitx. The next statement, called the selection principle, is immediate from the observation that for every sequence (xn) there is a subsequence whose limit is lim infxn.

Helly’s Theorem

Kronecker’s Lemma

The sequence (Xn) is almost surely convergent if and only if Ω0 is almost certain, that is, P(Ω0) = 1. Moreover, if Ω0 is almost certain, then letting X(ω) = limXn(ω ) for ω in Ω0 andX(ω) = 0 for ω /∈Ω0, we get a real-valued random variable X such thatXn→X almost certainly.

Characterization theorem

A sequence (Xn) is said to be almost certainly convergent if the numerical sequence (Xn(ω)) is convergent for almost every ω; it is said to converge to X if X is almost surely a real random variable and limXn(ω) =X(ω) for almost every ω. Of course, if X is another random variable such that X = X is almost certain, then Xn→X is also almost certain.

Borel-Cantelli lemmas

By the Borel-Cantelli lemma, the assumption implies that the condition of Proposition 1.5 holds for the sequence (xn) = (Xn(ω)) for almost everyω.

Borel-Cantelli: divergence part

The sequence (Xn) is almost certainly convergent if and only iflimm,n→∞|Xn−Xm|= 0 almost certainly. The following are equivalent: (Xn) is almost certainly convergent; (Yn) almost certainly converges to 0; (Zn) almost certainly converges to 0.

Convergence in metric spaces

If it converges to X in probability, then it has a subsequence that almost certainly converges to X. Theorem 2.7 therefore applies to the subseries (Xnk) to conclude that it almost certainly converges to 0. c) Assume that every subsequence of (Xn) has a further subsequence that almost certainly converges to 0.

Convergence and continuous functions

Then iε◦Xn →0 is almost certain, which implies via the bounded convergence theorem that pn →0.. b) Suppose that Xn → 0 in probability. Since Xn →X in probability implies along N, Theorem 3.3b implies that N has a subsequence N along which Xn →

Convergence and arithmetic operations

Metric for convergence in probability

Cauchy Criterion

If (Xn) diverges to +∞ and (Yn) converges to Y, both in probability, then (Xn+Yn) diverges to +∞in probability. Also, if the sequence converges to X inLp, then it converges to the same X in probability: By Markov's inequality, for everyε >0,.

Convergence, Cauchy, uniform integrability

To show that the sequence is uniformly integrable, we use the ε-δ characterization of Theorem II.3.14: Fixε >0. Since P(Hn)→0 by the supposed convergence in probability, this completes the proof that the sequence.

A variation on the main results

The sequence(Xn) is said to converge in distributiontoX if (μn) converges weakly to μ, that is, if .. a) Convergence in probability (or in L1, or almost certainly) implies convergence in distribution. Convergence in distribution is simply a convenient turn of phrase for the poor convergence of the corresponding probability measures.

Uniqueness of weak limits and equality of measures

Convergence of quantiles and distribution functions

Almost sure representations of weak convergence

It is worth noting the absence here of the third statement in 4.9, the one about the convergence of (Xn) to X in L1. This is because convergence inL1 concerns the order of joint distributionsπn of the pairs (Xn, X), and we have no guarantee that the joint distribution of Yn and Y for each isπn.

Convergence of image measures

Such representations elevate the convergence of the distribution to the level of near-certain convergence in situations where the desired results concern only the μn and μ distributions.

Tightness and Prohorov’s Theorem

Corresponding to the distribution functions is a measure μ, which we will present as a probability measure.

Convergence of Fourier transforms

Thus, for every ε > 0 there is b > 0 such that the right-hand side is less than ε/2, and so. In other words, every subsequence of (μn) has a further subsequence that converges weakly to the same probability measure μ(whose Fourier transform isf).

Convergence of characteristic functions

Assuming that ESn =npn→c, show that the distribution of Sn converges weakly to a Poisson distribution with mean. Assume that the Xn are pairwise independent and identically distributed with finite mean a and finite variance b.

Strong law of large numbers

So it is sufficient to prove the statement under the further assumption that Xn and X are positive. Then the previous proposition gives the proof ifEX =∞. Therefore, for the remainder of the proof we assume that 3≤X <∞andEX <∞. i).

Weak law of large numbers

For each discontinuity pointxforc there is an almost certain event Ωx such that cn(ω, x)−cn(ω, x−)→c(x)−c(x−) for each ωi Ωx; this is with 6.15 applied with A={x}. Thus, if ω belongs to the almost certain event Ω0∪Ω0, part (a) applies to show that cn(ω, x)→c(x) uniformly inx.

Inequalities for maxima

All the results below are for the case where Xn is independent, in which case Kolmogorov's 0-1 law holds and the convergence of the series has probability 0 or 1, the better case being our goal. Assume that Xn are independent, have zero mean, and are dominated by a constantb.

Convergence of series and variances

Let (Yn) be independent of (Xn) and have the same law. Xn−an) is almost certainly convergent. Yn−an) converges almost certainly and the sequence (Xn−Yn)n≥1 is bounded and E(Xn −Yn) = 0 for alln.

Kolmogorov’s three series theorem

The latter implies that Xn converges almost certainly, because the convergence of the first series shows, via Borel-Cantelli 2.5, that for almost every ω the numbers Xn(ω) and Yn(ω) differ by at most ﬁnite amounts. 1{Xn =Yn}<∞ almost certainly, and the independence of Xn implies that the events {Xn = Yn} are independent.

Application to strong laws

The independence of the Xn implies the independence of the Yn. The convergence of the third series implies that, via Theorem 7.5. From the divergence part of the Borel-Cantelli lemma, Theorem 2.9b, it follows that the first series must converge in 7.10.

Triangular arrays

Assuming that a = 0 and b = 1, which is without loss of generality, the essential idea is the following: For larges the variable Zn =Sn/√. For Eachn it is always assumed that the variables on the nth row are independent.

Liapunov’s Theorem

Returning to Liapunov's theorem, we note that the normalization hypotheses about the means and variances are harmless. Assume that for each nand j there is a constant cnj such that |Xnj| ≤cnj and that limnsupjcnj = 0.

Lindeberg’s Theorem

In addition to ensuring the distribution Zn→Zin, the Lindeberg condition implies that Xnj are uniformly small compared to Zn for large, that is.

Feller-L´ evy theorem

Convergence to Poisson distribution

The latter is equivalent to proving that Laplace transforms converge properly; it means we have to show it. Thus, the contribution of Xnj to the sum of Zn is 0 or 1, which means that Xnj are almost Bernoulli variables.

Convergence to infinitely divisible variables

In Lemma 8.27 below, we will show that there exists a subsequence K such that (S1,k) converges in distribution along K to some random variable Y1. Then, since Sjk has the same distribution as S1,k, the sequence (Sjk) converges in distribution along the same K to some variable Yj for j= 1,.

Preparatory steps

As usual, we treat F as the collection of events and as the collection of all F-measurable random variables. Similarly, for a sub-σ-algebra F, we treat F as the collection of all F-measurable random variables and F+ as the subset of positive ones.

Definition of conditional expectations

For the same reason, EFX should be considered as a multi-purpose notation for every version of ¯X. Some authors take an additional logical step and define .. conditional expectation . . ” to be the equivalent class of all variants, then use EFX as a representative of that class and write “EFX = ¯X almost certainly”, which means that ¯X is a variant. e) Integrability. For this reason, we can call ¯X the orthogonal projection of X onto F, and we call the defining property of 1.4b the “projection property” to suggest this picture.

Figure 3: The conditional expectation E F X is that F -measurable random variable that is closest to X.

Existence of conditional expectations

This uniqueness to equivalence extends to EFX for arbitraryX for which EX exists; see also (f) below. d) Language. Given uniqueness to equivalence, the article defined in “conditional expectation . . ” is just a slight abuse of language.

Properties similar to expectations

On the measurable space (Ω,F), then P is a probability measure, and Q is a measure that is absolutely continuous with respect to P. Then each X¯n is a version of EFXn, and ( ¯Xn) is an increasing row inF+; let ¯X be the limit.

Special properties

Conditioning as projection

Conditional expectations given random variables

Finally, the notation E(X|Y =y) is used and reads "the conditional expectation of X given that Y =y" despite its annoying ambiguity. It is also used when P{Y =y} = 0 for ally, and the correct interpretation is then that it is a notation for f(y) when f◦Y =EσYX.

Regular versions

For example, ifFi is generated by the partition (Gn) of Ω, which is the case if F is generated by a random variable having values in a countable space, then.

Conditional distributions

Thus, to show that D=Evia the monotone class theorem, it suffices to show that [−∞, t]∈D for every R. To show that it is the conditional distribution of Y given F, it remains to show the projection property for 2.8 , that is, we must show it.

Disintegrations

The preceding part applies to the real random variable g◦Y and shows the existence of the conditional distribution ˆL : (ω, B)→ Lˆω(B) from (Ω,F) to (ˆE,Eˆ) forg◦Y givenF. Since Y takes values in the standard measurable space (E,E), by Theorem 2.10, there is a regular versionLo of the conditional distribution of Y given F=σX.

Conditional densities

As an illustration of the calculations discussed above, we now derive the conditional distribution of Y given X =Y +Z. To repeat, given sumX=Y+Z, the conditional distribution of YB→K(X, B) is the Gaussian distribution with mean 12X and variance 1/2.

Conditional independence of random variables etc

The definition of conditional independence and the previous theorem are stated in terms of positive random variables. In particular, they give the existence and construction of the Lebesgue measure on (0,1), and hence the existence of all measures ever discussed.

Construction of chains: description of data and goal

Now think about the conditions on P: it must assign a probability to each event and do so so that the countable additivity condition is satisfied for every disjoint sequence of events. The proofs are not illuminating, but the constructions leading to the theorem clarify many of the concepts discussed earlier.

Construction and analysis

After all, with respect to each random variable there are at least as many events as there are points in R, and there are infinitely many random variables, and H must include all those events and their complements and all unions and their countable intersections. Of course, X touches the values in (En,En); for each experiment result, Xn(ω) is the result of the subsequent trial. Xn) indicates the result of the tests up to and including it; is a random variable that takes values in (Fno,Fon) defined by 4.1.

Ionescu-Tulcea’s theorem

It follows that the map PofHointo [0,1] is finitely additive: ifGandH is in Ho and disjoint, then there is n such that they both belong to Fn, and therefore P(G∪H) =P(G) +P (H). Suppose for the time being that limkP(Hk)>0; we will show that this leads to a contradiction.

Initial distribution

Kolmogorov extension theorem

Note that ˆπn has representation 4.4: this is trivial for n = 0 and, assuming it holds for n, it follows from the disintegration theorem 2.18 with (D,D) = (E,E)Jn that it holds for n+ 1 also (the standardity (E,E) is used here). Thus, the Ionescu-Tulcea theorem is used to show that there exists a probability measure PJ on (E,E)J such that the image of PJ under pJ is Jn ˆπn for every n.

Independent sequences

Jn all the same by replacing Jn with J =∪Jn; and then,H ={XJ∈A} with A = ∪An, the An is disjoint. This section is devoted to special cases of the probability spaces constructed in the preceding section, as well as certain alternatives to such constructions.

Markov chains

Such chains can be made homogeneous in time by including time in the state space: Let Eˆ =N×E,Eˆ= 2N⊗E, and define the Markov kernel ˆP on ( ˆE,Eˆ) such that. Then, ˆXn = (n modulo d, Xn), n ∈ N, form a time-homogeneous Markov chain with state space ( ˆE,Eˆ ) and transition kernel ˆP.

Markov processes, continuous time

For the Markov chains introduced so far, the conditional distribution Xn+1 with respect to the entire past F depended only on the last state of Xn. In some applications, for example if Xn is to denote the weather on a day, it is desirable that the dependence on the past be somewhat deeper: for a fixed integer k≥2, assume that.

Random fields

This chapter aims to introduce the vocabulary for describing the evolution of random systems over time. Given a stochastic processX = (Xt)t∈T, allowingFt=σ{Xs:s≤t}for each times, we obtain a filteringF= (Ft)t∈T; is called X-generated filtering.

Adaptedness

Stopping times

1 Filtrations and stopping times 173 alarm system that sounds exactly at time T, and only at time T, then T is a stopping time of F. Heuristically speaking, Tk is a dwell time because it is possible to build an alarm system that sounds exactly at the time of arrival.

Conventions for the end of time

That is, V∈ FT if and only ifXt∈Ftfor every ¯T, which is the claim of the first statement. Indeed, the last paragraph of the previous proof shows what is required of Y: For each, the mapping (s, ω)→Ys(ω) from Bt×Ω to ¯R must be Bt⊗Ft measurable.

Comparing diﬀerent pasts

1 Filtering and Stop Time 177 S must be less than that accumulated by T. The following shows this and gives further comparisons for generalS and T. 1.16 Theorem. Let S and T be the stopping times of F. Then, a) S∧T and S∨T are stopping times of F; .. i) Since SandT are stopping times, the events{S≤t}and {T ≤t}are inFtfor each time.

Times foretold

Approximation by discrete stopping times

Conditioning at stopping times

With the exception of the claim about repeated conditioning, these are all merely restatements of the definition of conditional expectations and Theorem IV.1.10. Show that an arbitrary time T is a stopping time of F if and only if, for any pair of outcomesω andω,.

Uniformly integrable martingales

In the further case where theRn is positive, the martingaleM is considered a reasonable model for the evolution of the price of a share of stock. Then Mn stands for the price of a stock at time, and Rn + 1 is interpreted as the return at time + 1 per dollar invested at time in that stock.