• Tidak ada hasil yang ditemukan

Exercises

Dalam dokumen Texts in Applied Mathematics 62 (Halaman 94-99)

1. Consider the posterior distribution on the initial condition, given by Theorem2.11, in the case of deterministic dynamics. In the case of Example2.4, program p2.m plots the prior and posterior distributions for this problem for data generated with true initial condition v0= 0.1. Why is the posterior distribution concentrating much closer to 0.9 than to the true initial condition at 0.1? Change the mean of the prior from 0.7 to 0.3. What do you observe regarding the effect on the posterior? Explain what you observe. Illustrate your findings with graphics.

2. Consider the posterior distribution on the initial condition, given by Theorem2.11, in the case of deterministic dynamics. In the case of Example2.4, program p3.m approximates the posterior distribution for this problem for data generated with true initial condition v0= 0.3. Why is the posterior distribution in this case approximately symmetric about 0.5? What happens if the mean of the prior is changed from 0.5 to 0.1? Explain what you observe. Illustrate your findings with graphics.

3. Consider the posterior distribution on the initial condition, given by Theorem2.11, in the case of deterministic dynamics. In the case of Example2.4, program p3.m approximates the posterior distribution for this problem. Modify the program so that the prior and data are the same as for the first exercise in this section. Compare the approximation to the posterior obtained by use of program p3.m with the true posterior as computed by program p2.m. Carry out similar comparisons for different choices of prior, ensuring that programs p2.m and p3.m share the same prior and the same data. In all cases, experiment with the choice of the parameter β in the proposal distribution within p3.m, and determine its effect on the displayed approximation of the true posterior computed from p2.m. Illustrate your findings with graphics.

4. Consider the posterior distribution on the initial condition, given by Theorem2.11, in the case of deterministic dynamics. In the case of Example2.4, program p3.m approximates the posterior distribution for this problem. Modify the program so that it applies to Example2.3. Experiment with the choice of the parameter J , which determines the length of the Markov chain simulation, within p3.m. Illustrate your findings with graphics.

5. Consider the posterior distribution on the signal, given by Theorem 2.8, in the case of stochastic dynamics. In the case of Example 2.3, program p4.m approximates the posterior distribution for this problem, using the independence dynamics sampler. Run this program for a range of values of γ. Report and explain the effect of γ on the acceptance probability curves.

6. Consider the posterior distribution on the signal, given by Theorem 2.8, in the case of stochastic dynamics. In the case of Example 2.3, program p5.m approximates the posterior distribution for this problem, using the pCN sampler. Run this program for a range of values of γ. Report and explain the effect of β on the acceptance probability curves.

7. Consider the posterior distribution on the signal, given by Theorem 2.8, in the case of stochastic dynamics. In the case of Example 2.3, program p6.m approximates the pos-terior distribution for this problem, using the pCN dynamics sampler. Run this program for a range of values of γ. Report and explain the effect of σ and of J on the acceptance probability curves.

8. Consider the MAP estimator for the posterior distribution on the signal, given by Theorem 3.10, in the case of stochastic dynamics. Program p7.m finds the MAP esti-mator for Example2.3. Increase J to 50 and display your results graphically. Now repeat

your experiments for the values γ = 0.01, 0.1, and 10 and display and discuss your findings.

Repeat the experiments using the “truth” as the initial condition for the minimization.

What effect does this have? Explain this effect.

9. Prove Theorem 3.12.

10. Consider application of the RWM proposal (3.15), applied in the case of stochastic dynamics. Find the form of the Metropolis–Hastings acceptance probability in this case.

11. Consider the family of probability measures μ onR with Lebesgue density proportional to exp

−V (u)

with V (u) given by (3.23). Prove that the family of measure μ is locally Lipschitz in the Hellinger metric and in the total variation metric.

Discrete Time: Filtering Algorithms

In this chapter, we describe various algorithms for the filtering problem. Recall from Sec-tion 2.4 that filtering refers to the sequential update of the probability distribution on the state given the data, as data is acquired, and that Yj={y}j=1denotes the data accumulated up to time j. The filtering update from time j to time j + 1 may be broken into two steps:

prediction, which is based on the equation for the state evolution, using the Markov kernel for the stochastic or deterministic dynamical system that mapsP(vj|Yj) intoP(vj+1|Yj); and analysis, which incorporates data via Bayes’s formula and mapsP(vj+1|Yj) intoP(vj+1|Yj+1).

All but one of the algorithms we study (the optimal proposal version of the particle filter) will also reflect these two steps.

We begin in Section 4.1 with the Kalman filter, which provides an exact algorithm to determine the filtering distribution for linear problems with additive Gaussian noise. Since the filtering distribution is Gaussian in this case, the algorithm comprises an iteration that maps the mean and covariance from time j to time j + 1. In Section 4.2, we show how the idea of Kalman filtering may be used to combine a dynamical model with data for nonlinear problems; in this case, the posterior distribution is not Gaussian, but the algorithms proceed by invoking a Gaussian ansatz in the analysis step of the filter. This results in algorithms that do not provably approximate the true filtering distribution in general; in various forms, they are, however, robust to use in high dimensions. In Section4.3, we introduce the particle filter methodology, which leads to provably accurate estimates of the true filtering distribution but which is, in its current forms, poorly behaved in high dimensions. The algorithms in Sections4.1–4.3are concerned primarily with stochastic dynamics, but setting Σ = 0 yields the corresponding algorithms for deterministic dynamics. In Section 4.4, we study the long-time behavior of some of the filtering algorithms introduced in the previous sections. Finally, in Section4.5, we present some numerical illustrations and conclude with bibliographic notes and exercises in Sections4.6and4.7.

For clarity of exposition, we again recall the form of the data assimilation problem. The signal is governed by the model of equations (2.1):

vj+1= Ψ(vj) + ξj, j∈ Z+, v0∼ N(m0, C0),

Electronic supplementary material The online version of this chapter (doi: 10.1007/

978-3-319-20325-6 4) contains supplementary material, which is available to authorized users.

©Springer International Publishing Switzerland 2015

K. Law et al., Data Assimilation, Texts in Applied Mathematics 62, DOI 10.1007/978-3-319-20325-6 4

79

where ξ = j}j∈N is an i.i.d. sequence, independent of v0, with ξ0 ∼ N(0, Σ). The data is given by equation (2.2):

yj+1= h(vj+1) + ηj+1, j∈ Z+,

where h : Rn → Rm and η = j}j∈Z+ is an i.i.d. sequence, independent of (v0, ξ), with η1∼ N(0, Γ ).

4.1 Linear Gaussian Problems: The Kalman Filter

This algorithm provides a sequential method for updating the filtering distributionP(vj|Yj) from time j to time j + 1, when Ψ and h are linear maps. In this case, the filtering distribution is Gaussian, and it can be characterized entirely through its mean and covariance. To see this, we note that the prediction step preserves Gaussianity by Lemma 1.5; the analysis step preserves Gaussianity because it is an application of Bayes’s formula (1.7), and then Lemma 1.6establishes the required Gaussian property, since the log pdf is quadratic in the unknown.

To be concrete, we let

Ψ(v) = M v, h(v) = Hv (4.2)

for matrices M ∈ Rn×n, H ∈ Rm×n. We assume that m ≤ n and Rank(H) = m. We let (mj, Cj) denote the mean and covariance of vj|Yj, noting that this entirely characterizes the random variable, since it is Gaussian. We let (m"j+1, "Cj+1) denote the mean and covariance of vj+1|Yj, noting that this, too, completely characterizes the random variable, since it is also Gaussian. We now derive the map (mj, Cj)→ (mj+1, Cj+1), using the intermediate variables (m"j+1, "Cj+1), so that we may compute the prediction and analysis steps separately. This gives the Kalman filter in a form in which the update is expressed in terms of precision rather than covariance.

Theorem 4.1. Assume that C0, Γ, Σ > 0. Then Cj > 0 for all j∈ Z+, and

Cj+1−1 = (M CjMT + Σ)−1+ HTΓ−1H, (4.3a) Cj+1−1mj+1= (M CjMT + Σ)−1M mj+ HTΓ−1yj+1. (4.3b) Proof We assume for the purposes of induction that Cj > 0, noting that this is true for j = 0 by assumption. The prediction step is determined by (2.1) in the case Ψ(·) = M·:

vj+1= M vj+ ξj, ξj∼ N(0, Σ).

From this, it is clear that

E(vj+1|Yj) =E(Mvj|Yj) +E(ξj|Yj).

Since ξj is independent of Yj, we have

"

mj+1= M mj. (4.4)

Similarly, E

(vj+1− "mj+1)⊗ (vj+1− "mj+1)|Yj

=E

M (vj− mj)⊗ M(vj− mj)|Yj

+E

ξj⊗ ξj|Yj

 +E

M (vj− mj)⊗ ξj|Yj

+E

ξj⊗ M(vj− mj)|Yj

.

Again, since ξj is independent of Yj and vj, we have

"

Cj+1= ME((vj− mj)⊗ (vj− mj)|Yj)MT + Σ

= M CjMT + Σ. (4.5)

Note that "Cj+1> 0, because Cj> 0 by the inductive hypothesis and Σ > 0 by assumption.

Now we consider the analysis step. By (2.31), which is just Bayes’s formula, and using Gaussianity, we have

exp

1

2v− mj+12

Cj+1

∝ exp

1

2Γ12(yj+1− Hv)21

2 "Cj+112(v− "mj+1)2 (4.6a)

= exp

1

2yj+1− Hv2

Γ 1

2v− "mj+12"

Cj+1



. (4.6b)

Equating quadratic terms in v gives, since Γ > 0 by assumption,

Cj+1−1 = "Cj+1−1 + HTΓ−1H, (4.7) and equating linear terms in v gives1

Cj+1−1mj+1= "Cj+1−1m"j+1+ HTΓ−1yj+1. (4.8) Substituting the expressions (4.4) and (4.5) for (m"j+1, "Cj+1) gives the desired result. It re-mains to verify that Cj+1 > 0. From (4.7), it follows, since Γ−1 > 0 by assumption and

"

Cj+1> 0 (proved above), that Cj+1−1 > 0. Hence Cj+1> 0, and the induction is complete.  We may now reformulate the Kalman filter using covariances directly, rather than using precisions.

Corollary 4.2. Under the assumptions of Theorem 4.1, the formulas for the Kalman filter given there may be rewritten as follows:

dj+1= yj+1− H "mj+1, Sj+1= H "Cj+1HT+ Γ, Kj+1= "Cj+1HTSj+1−1 , mj+1=m"j+1+ Kj+1dj+1,

Cj+1= (I− Kj+1H) "Cj+1, with (m"j+1, "Cj+1) given in (4.4), (4.5).

Proof By (4.7), we have

Cj+1−1 = "Cj+1−1 + HTΓ−1H, and application of Lemma4.4 below gives

Cj+1= "Cj+1− "Cj+1HT(Γ + H "Cj+1HT)−1H "Cj+1

=



I− "Cj+1HT(Γ + H "Cj+1HT)−1H

C"j+1

= (I− "Cj+1HTSj+1−1 H) "Cj+1

= (I− Kj+1H) "Cj+1,

1We do not need to match the constant terms (with respect tov), since the normalization constant in Bayes’s theorem deals with matching these.

as required. Then the identity (4.8) gives

mj+1= Cj+1C"j+1−1m"j+1+ Cj+1HTΓ−1yj+1

= (I− Kj+1H)m"j+1+ Cj+1HTΓ−1yj+1. (4.9) Now note that again by (4.7),

Cj+1( "Cj+1−1 + HTΓ−1H) = I, so that

Cj+1HTΓ−1H = I− Cj+1C"j+1−1

= I− (I − Kj+1H)

= Kj+1H.

Since H has rank m, we deduce that

Cj+1HTΓ−1= Kj+1. Hence (4.9) gives

mj+1= (I− Kj+1H)m"j+1+ Kj+1yj+1=m"j+1+ Kj+1dj+1,

as required. 

Remark 4.3. The key difference between the Kalman update formulas in Theorem 4.1 and those in Corollary4.2 is that in the former, matrix inversion takes place in the state space, with dimension n, while in the latter matrix, inversion takes place in the data space, with dimension m. In many applications, m n, since the observed subspace dimension is much less than the state space dimension, and thus the formulation in Corollary 4.2is frequently employed in practice. The quantity dj+1 is referred to as the innovation at time step j + 1.

It measures the mismatch of the predicted state from the data. The matrix Kj+1is known as

the Kalman gain.

The following matrix identity was used to derive the formulation of the Kalman filter in which inversion takes place in the data space.

Lemma 4.4. Woodbury Matrix Identity Let A ∈ Rp×p, U ∈ Rp×q, C ∈ Rq×q, and V ∈ Rq×p. If A and C are positive, then A + U CV is invertible, and

(A + U CV )−1= A−1− A−1U



C−1+ V A−1U

−1 V A−1.

Dalam dokumen Texts in Applied Mathematics 62 (Halaman 94-99)