Chapter IV: Machine learning of model error
4.6 Numerical Experiments
4.6.4 Initializing and Stabilizing the RNN
As mentioned in Remark4.5.17the RNN approximates an enlarged system which contains solutions of the original system as trajectories confined to the invariant manifoldπ =πβ (π₯ , π¦); see identity (4.39). However, this invariant manifold may be unstable, either as a manifold within the continuous-time model (4.38), or as a result of numerical instability. We now demonstrate this with numerical experiments. This instability points to the need for data assimilation to be used with RNNs if prediction of the original system is desired, not only to initialize the system but also to stabilize the dynamics to remain near to the desired invariant manifold.
To illustrate these challenges, we consider the problem of modeling evolution of
a single component of the L63 system (4.40). Consider this as variableπ₯ in (4.7).
As exhibited in (4.38), model error may be addressed in this setting by learning a representation that contains the hidden statesπ¦in (4.7) (i.e. the other two unobserved components of (4.40)), but since the dimension of the hidden states is typically not knowna priorithe dimensions of the latent variables in the RNN (and the system it approximates) may be greater than those of π¦; in the specific construction we use to prove the existence of an approximating RNN we introduce a vector field for evolution of the errorπas well asπ¦. We now discuss the implications of embedding the true dynamics in a higher dimensional system in the specific context of the embedded system (4.38). However the observations apply to any embedding of the desired dynamics (4.7) (withπ =1) within any higher dimensional system.
We choose examples for which (4.39) implies thatπβπβ is constant in time. Then, under (4.38),
πβπβ (π₯ , π¦)
(π‘) =constant;
that is, it is constant in time. The desired invariant manifold (where the constant is 0) is thus stable. However this stability only holds in a neutral sense: linearization about the manifold exhibits a zero eigenvalue related to translation ofπβπβ by a constant.
We now illustrate that this embedded invariant manifold can be unstable; in this case the instability is caused by numerical integration, which breaks the conservation of πβπβ in time.
Example 1: Consider equation (4.40) which we write in form (4.8) by setting π₯ =π’π₯ andπ¦ =(π’π¦, π’π§).Then we let π0(π’π₯):=βππ’π₯ yieldingπβ (π’π¦) =ππ’π¦. Thus πβ = π0+πβ is defined by the first component of the right-hand side of (4.40).
The function πβ (π’π¦, π’π§) is then given by the second and third components of the right-hand side of (4.40). Applying the methodology leading to (4.38) to (4.40) results in the following four dimensional system:
Β€
π’π₯ = π0(π’π₯) +π, π’π₯(0) =π₯0, (4.44a)
Β€
π’π¦ =ππ’π₯βπ’π¦βπ’π₯π’π§, π’π¦(0) = π¦0, (4.44b)
Β€
π’π§ =βππ’π§+π’π₯π’π¦, π’π§(0) =π§0, (4.44c)
Β€
π =π ππ’π₯βπ’π¦βπ’π₯π’π§
, π(0) =πβ (π¦0). (4.44d) Here we have omitted theπ’π¦-dependence from the equation (4.40) forπ’π₯, and aim to learn this error term; we introduce the variableπ in order to do so. This system, when projected intoπ’π₯, π’π¦, π’π§, behaves identically to (4.40) whenπ(0) =πβ (π¦0).
Thus the 4βdimensional system in (4.44) has an embedded invariant manifold on which the dynamics is coincident with that of the 3βdimensional L63 system.
We numerically integrate the 4βdimensional system in (4.44) for 10000 model time units (initialized at π₯0 = 1, π¦0 = 3, π§0 = 1, π0 = π π¦0 = 30), and show in Figure 4.12 that the resulting measure for π’π₯ (dashed red) is nearly identical to its invariant measure in the traditional 3βdimensional L63 system in (4.40) (solid black). However, we re-run the simulation for a perturbedπ(0) =πβ (π¦0) +1, and see in Figure4.12(dotted blue) that this yields a different invariant measure forπ’π₯. This result emphasizes the importance of correctly initializing an RNN not only for efficient trajectory forecasting, but also for accurate statistical representation of long-time behavior.
Figure 4.12: Here, we show that the invariant density for the first component of L63 (black) can be reproduced by a correctly initialized augmented 4βd system (dashed red) in (4.44). However, incorrect initialization ofπ(0)in (4.44) (dotted blue) yields a different invariant density.
Example 2: Now we consider (4.40) which we write in form (4.8) by setting π₯ = π’π§ and π¦ = (π’π₯, π’π¦). We let π0(π’π§) := βππ’π§ and πβ (π’π₯, π’π¦) := π’π₯π’π¦, so that πβ = π0+πβ corresponds to the third component of the right-hand side of (4.40).
Functionπβ (π’π₯, π’π¦)is defined by the first two components of the right-hand side of (4.40). We again form a 4βdimensional system corresponding to (4.40) using the methodology that leads to (4.38):
Β€
π’π₯ =π(π’π¦βπ’π₯), π’π₯(0) =π₯0, (4.45a)
Β€
π’π¦ =ππ’π₯βπ’π¦βπ’π₯π’π§, π’π¦(0) =π¦0, (4.45b)
Β€
π’π§ = π0(π’π§) +π, π’π§(0) =π§0, (4.45c)
Β€
π =π’π₯π’Β€π¦+π’π¦π’Β€π₯, π(0) =πβ (π₯0, π¦0). (4.45d)
We integrate (4.45) for 3000 model time units (initialized atπ₯0 = 1, π¦0 = 3, π§0 = 1, π0 =π₯0π¦0 =3), and show in Figure4.13that the 3βdimensional Lorenz attrac- tor is unstable with respect to perturbations in the numerical integration of the 4βdimensional system. The solutions forπ’π₯, π’π¦, π’π§ eventually collapse to a fixed point after the growing discrepancy betweenπ(π‘)andπβ becomes too large. The time at which collapse occurs may be delayed by using smaller tolerances within the numerical integrator (we employMatlab rk45) demonstrating that the instability is caused by the numerical integrator. This collapse is very undesirable if prediction of long-time statistics is a desirable goal. On the other hand, Figure 4.14shows short-term accuracy of the 4βdimensional system in (4.45) up to 12 model time units when correctly initialized (π0 =πβ (π₯0, π¦0), dashed red), and accuracy up to 8 model time units when initialization ofπ0is perturbed (π0 =πβ (π₯0, π¦0) +1, dotted blue).
This result demonstrates the fundamental challenges of representing chaotic attractors in enlarged dimensions and may help explain observations of RNNs yielding good short-term accuracy, but inaccurate long-term statistical behavior. While empirical stability has been observed in some discrete-time LSTMs [421,165], the general problem illustrated above is likely to manifest in any problems where the dimension of the learned model exceeds that of the true model; the issue of how to address initialization of such models, and its interaction with data assimilation, therefore merits further study.