Exercises - Texts in Applied Mathematics 62

1. Consider the posterior distribution on the initial condition, given by Theorem2.11, in the case of deterministic dynamics. In the case of Example2.4, program p2.m plots the prior and posterior distributions for this problem for data generated with true initial condition v₀= 0.1. Why is the posterior distribution concentrating much closer to 0.9 than to the true initial condition at 0.1? Change the mean of the prior from 0.7 to 0.3. What do you observe regarding the eﬀect on the posterior? Explain what you observe. Illustrate your ﬁndings with graphics.

2. Consider the posterior distribution on the initial condition, given by Theorem2.11, in the case of deterministic dynamics. In the case of Example2.4, program p3.m approximates the posterior distribution for this problem for data generated with true initial condition v₀= 0.3. Why is the posterior distribution in this case approximately symmetric about 0.5? What happens if the mean of the prior is changed from 0.5 to 0.1? Explain what you observe. Illustrate your ﬁndings with graphics.

3. Consider the posterior distribution on the initial condition, given by Theorem2.11, in the case of deterministic dynamics. In the case of Example2.4, program p3.m approximates the posterior distribution for this problem. Modify the program so that the prior and data are the same as for the first exercise in this section. Compare the approximation to the posterior obtained by use of program p3.m with the true posterior as computed by program p2.m. Carry out similar comparisons for different choices of prior, ensuring that programs p2.m and p3.m share the same prior and the same data. In all cases, experiment with the choice of the parameter β in the proposal distribution within p3.m, and determine its effect on the displayed approximation of the true posterior computed from p2.m. Illustrate your findings with graphics.

4. Consider the posterior distribution on the initial condition, given by Theorem2.11, in the case of deterministic dynamics. In the case of Example2.4, program p3.m approximates the posterior distribution for this problem. Modify the program so that it applies to Example2.3. Experiment with the choice of the parameter J , which determines the length of the Markov chain simulation, within p3.m. Illustrate your ﬁndings with graphics.

5. Consider the posterior distribution on the signal, given by Theorem 2.8, in the case of stochastic dynamics. In the case of Example 2.3, program p4.m approximates the posterior distribution for this problem, using the independence dynamics sampler. Run this program for a range of values of γ. Report and explain the eﬀect of γ on the acceptance probability curves.

6. Consider the posterior distribution on the signal, given by Theorem 2.8, in the case of stochastic dynamics. In the case of Example 2.3, program p5.m approximates the posterior distribution for this problem, using the pCN sampler. Run this program for a range of values of γ. Report and explain the eﬀect of β on the acceptance probability curves.

7. Consider the posterior distribution on the signal, given by Theorem 2.8, in the case of stochastic dynamics. In the case of Example 2.3, program p6.m approximates the pos-terior distribution for this problem, using the pCN dynamics sampler. Run this program for a range of values of γ. Report and explain the eﬀect of σ and of J on the acceptance probability curves.

8. Consider the MAP estimator for the posterior distribution on the signal, given by Theorem 3.10, in the case of stochastic dynamics. Program p7.m ﬁnds the MAP esti-mator for Example2.3. Increase J to 50 and display your results graphically. Now repeat

your experiments for the values γ = 0.01, 0.1, and 10 and display and discuss your ﬁndings.

Repeat the experiments using the “truth” as the initial condition for the minimization.

What eﬀect does this have? Explain this eﬀect.

9. Prove Theorem 3.12.

10. Consider application of the RWM proposal (3.15), applied in the case of stochastic dynamics. Find the form of the Metropolis–Hastings acceptance probability in this case.

11. Consider the family of probability measures μ onR with Lebesgue density proportional to exp

−V(u)

with V(u) given by (3.23). Prove that the family of measure μis locally Lipschitz in the Hellinger metric and in the total variation metric.

Discrete Time: Filtering Algorithms

In this chapter, we describe various algorithms for the filtering problem. Recall from Sec-tion 2.4 that filtering refers to the sequential update of the probability distribution on the state given the data, as data is acquired, and that Y_j={y}^j₌₁denotes the data accumulated up to time j. The filtering update from time j to time j + 1 may be broken into two steps:

prediction, which is based on the equation for the state evolution, using the Markov kernel for the stochastic or deterministic dynamical system that mapsP(vj|Yj) intoP(vj+1|Yj); and analysis, which incorporates data via Bayes’s formula and mapsP(vj+1|Yj) intoP(vj+1|Yj+1).

All but one of the algorithms we study (the optimal proposal version of the particle ﬁlter) will also reﬂect these two steps.

We begin in Section 4.1 with the Kalman filter, which provides an exact algorithm to determine the filtering distribution for linear problems with additive Gaussian noise. Since the filtering distribution is Gaussian in this case, the algorithm comprises an iteration that maps the mean and covariance from time j to time j + 1. In Section 4.2, we show how the idea of Kalman filtering may be used to combine a dynamical model with data for nonlinear problems; in this case, the posterior distribution is not Gaussian, but the algorithms proceed by invoking a Gaussian ansatz in the analysis step of the filter. This results in algorithms that do not provably approximate the true filtering distribution in general; in various forms, they are, however, robust to use in high dimensions. In Section4.3, we introduce the particle filter methodology, which leads to provably accurate estimates of the true filtering distribution but which is, in its current forms, poorly behaved in high dimensions. The algorithms in Sections4.1–4.3are concerned primarily with stochastic dynamics, but setting Σ = 0 yields the corresponding algorithms for deterministic dynamics. In Section 4.4, we study the long-time behavior of some of the filtering algorithms introduced in the previous sections. Finally, in Section4.5, we present some numerical illustrations and conclude with bibliographic notes and exercises in Sections4.6and4.7.

For clarity of exposition, we again recall the form of the data assimilation problem. The signal is governed by the model of equations (2.1):

v_j+1= Ψ(v_j) + ξ_j, j∈ Z⁺, v₀∼ N(m0, C₀),

Electronic supplementary material The online version of this chapter (doi: 10.1007/

978-3-319-20325-6 4) contains supplementary material, which is available to authorized users.

©Springer International Publishing Switzerland 2015

K. Law et al., Data Assimilation, Texts in Applied Mathematics 62, DOI 10.1007/978-3-319-20325-6 4

where ξ = {ξj}j∈N is an i.i.d. sequence, independent of v₀, with ξ₀ ∼ N(0, Σ). The data is given by equation (2.2):

y_j+1= h(v_j+1) + η_j+1, j∈ Z⁺,

where h : Rⁿ → R^m and η = {η_j}_j∈Z⁺ is an i.i.d. sequence, independent of (v₀, ξ), with η₁∼ N(0, Γ ).

4.1 Linear Gaussian Problems: The Kalman Filter

This algorithm provides a sequential method for updating the ﬁltering distributionP(vj|Yj) from time j to time j + 1, when Ψ and h are linear maps. In this case, the ﬁltering distribution is Gaussian, and it can be characterized entirely through its mean and covariance. To see this, we note that the prediction step preserves Gaussianity by Lemma 1.5; the analysis step preserves Gaussianity because it is an application of Bayes’s formula (1.7), and then Lemma 1.6establishes the required Gaussian property, since the log pdf is quadratic in the unknown.

To be concrete, we let

Ψ(v) = M v, h(v) = Hv (4.2)

for matrices M ∈ R^n×n, H ∈ R^m×n. We assume that m ≤ n and Rank(H) = m. We let (m_j, C_j) denote the mean and covariance of v_j|Yj, noting that this entirely characterizes the random variable, since it is Gaussian. We let (m"_j+1, "C_j+1) denote the mean and covariance of v_j+1|Yj, noting that this, too, completely characterizes the random variable, since it is also Gaussian. We now derive the map (m_j, C_j)→ (mj+1, C_j+1), using the intermediate variables (m"_j+1, "C_j+1), so that we may compute the prediction and analysis steps separately. This gives the Kalman ﬁlter in a form in which the update is expressed in terms of precision rather than covariance.

Theorem 4.1. Assume that C₀, Γ, Σ > 0. Then C_j > 0 for all j∈ Z⁺, and

C_j+1⁻¹ = (M C_jM^T + Σ)⁻¹+ H^TΓ⁻¹H, (4.3a) C_j+1⁻¹m_j+1= (M C_jM^T + Σ)⁻¹M m_j+ H^TΓ⁻¹y_j+1. (4.3b) Proof We assume for the purposes of induction that C_j > 0, noting that this is true for j = 0 by assumption. The prediction step is determined by (2.1) in the case Ψ(·) = M·:

v_j+1= M v_j+ ξ_j, ξ_j∼ N(0, Σ).

From this, it is clear that

E(vj+1|Yj) =E(Mvj|Yj) +E(ξj|Yj).

Since ξ_j is independent of Y_j, we have

m_j+1= M m_j. (4.4)

Similarly, E

(v_j+1− "m_j+1)⊗ (vj+1− "m_j+1)|Yj

M (v_j− mj)⊗ M(vj− mj)|Yj

ξ_j⊗ ξj|Yj

M (v_j− mj)⊗ ξj|Yj

ξ_j⊗ M(vj− mj)|Yj

Again, since ξ_j is independent of Y_j and v_j, we have

C_j+1= ME((vj− mj)⊗ (vj− mj)|Yj)M^T + Σ

= M C_jM^T + Σ. (4.5)

Note that "C_j+1> 0, because C_j> 0 by the inductive hypothesis and Σ > 0 by assumption.

Now we consider the analysis step. By (2.31), which is just Bayes’s formula, and using Gaussianity, we have

exp

−1

2v− mj+1²

Cj+1

∝ exp

−1

2Γ⁻¹²(y_j+1− Hv)²−1

2 "C_j+1⁻¹²(v− "m_j+1)² (4.6a)

= exp

−1

2y_j+1− Hv²

Γ −1

2v− "m_j+1²_"

Cj+1

. (4.6b)

Equating quadratic terms in v gives, since Γ > 0 by assumption,

C_j+1⁻¹ = "C_j+1⁻¹ + H^TΓ⁻¹H, (4.7) and equating linear terms in v gives¹

C_j+1⁻¹m_j+1= "C_j+1⁻¹m"_j+1+ H^TΓ⁻¹y_j+1. (4.8) Substituting the expressions (4.4) and (4.5) for (m"_j+1, "C_j+1) gives the desired result. It re-mains to verify that C_j+1 > 0. From (4.7), it follows, since Γ⁻¹ > 0 by assumption and

C_j+1> 0 (proved above), that C_j+1⁻¹ > 0. Hence C_j+1> 0, and the induction is complete. We may now reformulate the Kalman ﬁlter using covariances directly, rather than using precisions.

Corollary 4.2. Under the assumptions of Theorem 4.1, the formulas for the Kalman ﬁlter given there may be rewritten as follows:

d_j+1= y_j+1− H "m_j+1, S_j+1= H "C_j+1H^T+ Γ, K_j+1= "C_j+1H^TS_j+1⁻¹ , m_j+1=m"_j+1+ K_j+1d_j+1,

C_j+1= (I− Kj+1H) "C_j+1, with (m"_j+1, "C_j+1) given in (4.4), (4.5).

Proof By (4.7), we have

C_j+1⁻¹ = "C_j+1⁻¹ + H^TΓ⁻¹H, and application of Lemma4.4 below gives

C_j+1= "C_j+1− "C_j+1H^T(Γ + H "C_j+1H^T)⁻¹H "C_j+1

I− "C_j+1H^T(Γ + H "C_j+1H^T)⁻¹H

C"_j+1

= (I− "C_j+1H^TS_j+1⁻¹ H) "C_j+1

= (I− Kj+1H) "C_j+1,

1We do not need to match the constant terms (with respect tov), since the normalization constant in Bayes’s theorem deals with matching these.

as required. Then the identity (4.8) gives

m_j+1= C_j+1C"_j+1⁻¹m"_j+1+ C_j+1H^TΓ⁻¹y_j+1

= (I− Kj+1H)m"_j+1+ C_j+1H^TΓ⁻¹y_j+1. (4.9) Now note that again by (4.7),

C_j+1( "C_j+1⁻¹ + H^TΓ⁻¹H) = I, so that

C_j+1H^TΓ⁻¹H = I− Cj+1C"_j+1⁻¹

= I− (I − Kj+1H)

= K_j+1H.

Since H has rank m, we deduce that

C_j+1H^TΓ⁻¹= K_j+1. Hence (4.9) gives

m_j+1= (I− Kj+1H)m"_j+1+ K_j+1y_j+1=m"_j+1+ K_j+1d_j+1,

as required.

Remark 4.3. The key diﬀerence between the Kalman update formulas in Theorem 4.1 and those in Corollary4.2 is that in the former, matrix inversion takes place in the state space, with dimension n, while in the latter matrix, inversion takes place in the data space, with dimension m. In many applications, m n, since the observed subspace dimension is much less than the state space dimension, and thus the formulation in Corollary 4.2is frequently employed in practice. The quantity d_j+1 is referred to as the innovation at time step j + 1.

It measures the mismatch of the predicted state from the data. The matrix K_j+1is known as

the Kalman gain. ♠

The following matrix identity was used to derive the formulation of the Kalman ﬁlter in which inversion takes place in the data space.

Lemma 4.4. Woodbury Matrix Identity Let A ∈ R^p×p, U ∈ R^p×q, C ∈ R^q×q, and V ∈ R^q×p. If A and C are positive, then A + U CV is invertible, and

(A + U CV )⁻¹= A⁻¹− A⁻¹U

C⁻¹+ V A⁻¹U

₋₁ V A⁻¹.

Dalam dokumen Texts in Applied Mathematics 62 (Halaman 94-99)