Lorenz 96 Multi-scale Model - Numerical Results for Lorenz Models

Chapter VI: Learning About Structural Errors in

6.4 Lorenz 96 Systems as Illustrative Examples

6.4.2 Numerical Results for Lorenz Models

6.4.2.1 Lorenz 96 Multi-scale Model

We first studied the multi-scale Lorenz 96 system from (6.19). The numerical examples of multi-scale Lorenz 96 systems are summarized as below:

(i) For𝑐=10 andℎ =1 in the multi-scale Lorenz 96 system, a dictionary-learning- based model is trained as local deterministic error model𝛿(𝑋_𝑘)using direct data ({𝑥_𝑘, ℎ𝑐 𝑧_𝑘}), and a fully-connected neural network is trained using indirect data (first and second moments of the slow variable𝑥_𝑘). For the indirect data in this example, we assume partial observation of first eight slow variables and include cross-terms of second moments (i.e.,E(𝑥_𝑖𝑥_𝑗)for different𝑖and 𝑗). The results are presented in Figs.6.2and6.3.

(ii) For 𝑐 = 3 and ℎ = 10/3 in the multi-scale Lorenz 96 system, the scale separation between fast and slow variables becomes smaller and thus leads to a more challenging case. In this case, a fully-connected neural network is trained as the local deterministic error model𝛿(𝑋_𝑘) using either direct data ({𝑥_𝑘, ℎ𝑐 𝑧_𝑘}) or indirect data (first to fourth moments of the slow variable𝑥_𝑘 and the autocorrelation of𝑥_𝑘). For the indirect data in this and next examples, we enable the full observation of all 36 slow variables and preclude the use of all cross-terms of second to fourth moments. The results are presented in Figs.6.4and6.5.

(iii) For 𝑐 = 3 and ℎ = 10/3 in the multi-scale Lorenz 96 system, we trained a non-local deterministic error model 𝛿(𝑋) = Í

𝑘⁰𝛿(𝑋_𝑘0)C (𝑘 − 𝑘⁰;𝜃_non-loc), a local stochastic error model with additive noise 𝛿(𝑋_𝑘) +

√

𝜎²𝑊¤_𝑘, and a local stochastic error model with multiplicative noise𝛿(𝑋_𝑘) +p

𝜎²(𝑋_𝑘) ¤𝑊_𝑘, using indirect data (first to fourth moments of the slow variable𝑥_𝑘 and the autocorrelation of𝑥_𝑘). The results are presented in Figs.6.6to6.8.

Figure6.2a presents the direct data{𝑥_𝑘, ℎ𝑐 𝑧_𝑘}. Based on direct data, a regression model𝛿(𝑋_𝑘)can be trained and then used to simulate the dynamical system of 𝑋_𝑘 in (6.22). In this work, we train such a regression model using fully-connected neural network with two hidden layers (five neurons at the first hidden layer and one neuron at the second hidden layer). It can be seen in Fig.6.2a that the trained model captures the general pattern of the training data. We also simulate the dynamical system in (6.22) for a long time trajectory and compare the invariant measure of𝑋_𝑘 with the

true system in (6.19). As shown in Fig.6.2b, we obtain a good agreement between the invariant measures of the modeled system and the true system.

5 0 5 10

X 0.2

0.0 0.2 0.4

(X)

Training data Neural network

(a) Trained model

10 5 0 5 10 15

X 0.00

0.02 0.04 0.06 0.08 0.10 0.12

Probability density

Truth Prediction

(b) Invariant measure

Figure 6.2: Direct training of the error model (𝑐 =10) using neural network, with results of (a) the trained error model and (b) invariant measures.

Because direct data {𝑥_𝑘, ℎ𝑐 𝑧_𝑘} may not be accessible for some applications, we also explored the use of indirect data to calibrate the error model 𝛿(𝑋_𝑘). In this example, the first and second order moments of the first eight components of𝑥_𝑘 are used for the calibration. We tested different approaches that parameterize the error model 𝛿(𝑋_𝑘), including dictionary learning (Fig. 6.3), GPs and neural networks.

The error model based on dictionary learning has the form𝛿(𝑋_𝑘)=Í2

𝑖=1𝛼_𝑖𝜙_𝑖(𝑋_𝑘), where we choose the basis function dictionary𝜙_𝑖(𝑋_𝑘) ∈ {tanh(𝛽₁𝑋_𝑘),tanh(𝛽₂𝑋²

𝑘)}.

Therefore we have {𝛼_𝑖, 𝛽_𝑖}²

𝑖=1as unknown parameters that can be learned. Instead of using polynomial basis functions, we have introduced the hyperbolic tangent function tanh(·)to enhance the numerical stability. The error model based on a GP has the form 𝛿(𝑋_𝑘) = Í7

𝑗=1𝛼_𝑗K (𝑋⁽

𝑗) 𝑘

, 𝑋_𝑘; 𝜓), where we chose the 𝑋⁽

𝑗)

𝑘 as seven fixed points uniformly distributed in [−15,15], and K as a squared exponential kernel with unknown constant hyper-parameters𝜓={𝜎_GP, ℓ}, where𝜎_GPdenotes the standard deviation andℓthe length scale of the kernel. The results with a GP and with a neural network are similar to the ones with dictionary learning in Fig.6.3and are omitted here. The calibrated models of all three tests lead to good agreement in both data and invariant measure, and the performance of the calibrated model is not sensitive to the specific choice of parameterization approaches.

Although the performance of the calibrated error model is not sensitive to either the types of data or the parameterization approaches for this numerical example with 𝑐 =10, we emphasize that the specific choices made in constructing and calibrating error model are still important in general, especially for more challenging scenarios,

10 5 0 5 10 15 Prediction

10 5 0 5 10 15

Truth

First moments Second moments

(a) Data comparison

10 5 0 5 10 15

X 0.00

0.02 0.04 0.06 0.08 0.10 0.12

Probability density

Truth Prediction

(b) Invariant measure

Figure 6.3: Indirect training of the error model (𝑐 =10) using dictionary learning, with the results of (a) first and second order moments and (b) invariant measures.

The results with a GP and a neural network have similar performance and are omitted here.

e.g., the resolved and unresolved degrees of freedom have less noticeable scale separation. To illustrate the advantage of using indirect data and stochastic/non-local error model, we studied a more challenging scenario where the scale separation between𝑥_𝑘 and 𝑦_{𝑗 , 𝑘} in (6.19) is narrower, by settingℎ =10/3 and𝑐 = 3 for both slow and fast dynamics. It can be seen in Fig.6.4a that the general pattern of direct data is still captured by the trained error model𝛿(𝑋_𝑘). However, the comparison of invariant measures in Fig.6.4b shows that the long-time behaviour of the trained model does not have a good agreement with the true system, indicating the limitation of merely using direct data for the calibration of a local and deterministic error model.

2.5 0.0 2.5 5.0 7.5 10.0 X

0.4 0.2 0.0 0.2 0.4 0.6 0.8

(X)

Training data Neural network

(a) Trained model

5 0 5 10

X 0.0

0.1 0.2 0.3 0.4 0.5 0.6

Probability density

Truth Prediction

(b) Invariant measure

Figure 6.4: Direct training of the error model (𝑐 = 3) using neural network, with results of (a) the trained error model and (b) invariant measures..

We further investigated the use of indirect data. Specifically, the first four moments

of 𝑋_𝑘 and ten points sampled from the averaged autocorrelation function of𝑋_𝑘 are used as training data. Figure6.5presents the results of calibrated local model𝛿(𝑋_𝑘). It can be seen in Fig.6.5a that the trained error model provides a good agreement with the training data, while the invariant measures in Fig.6.5b still demonstrates noticeable difference between the calibrated and the true systems, indicating an overfitting of training data.

2 0 2 4 6 8 10

Prediction 2

0 2 4 6 8 10

Truth

First moments Second moments Third moments Fourth moments Autocorrelation

(a) Data comparison

5 0 5 10

X 0.0

0.2 0.4 0.6 0.8 1.0

Probability density

Truth Prediction

(b) Invariant measure

Figure 6.5: Indirect training of the error model (𝑐 =3) using deterministic model (local), with the results of (a) first four moments and autocorrelation, and (b) invariant measures.

To avoid the overfitting in Fig.6.5, we tried to calibrate a non-local error model as discussed in Section6.3.5. Compared to the results of the local error model, it can be seen in Fig.6.6that the invariant measure of the calibrated system has a better agreement with the true system, which indicates that the closure model𝛿(·)in (6.22) with non-local effect would better characterize closure for unresolved scales if there is a less clear scale separation between resolved and unresolved scales.

We also explored the learning of stochastic error model for this example. Figure6.7 presents the results of calibrated system with an additive stochastic error model.

Compared to the results of deterministic error models in Figs.6.5and6.6, we can see that the invariant measure of the calibrated system demonstrates a better agreement with the true system in Fig.6.7b. We further tested the stochastic error model by also learning a state-dependent diffusion coefficient. As shown in Fig.6.8b, the calibrated system achieves better agreement with the invariant measure of the true system, which confirms that increased flexibility in the stochastic error model can help achieve improved predictive performance via training against indirect data. It should be noted that Figs.6.6to6.8do not display a single converged invariant measure;

this is due to the fact that the calibrated parameters vary across the ensemble, and do

2 0 2 4 6 8 10 Prediction

2 0 2 4 6 8 10

Truth

First moments Second moments Third moments Fourth moments Autocorrelation

(a) Data comparison

5 0 5 10

X 0.0

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Probability density

Truth Prediction

(b) Invariant measure

Figure 6.6: Indirect training of the error model (𝑐 =3) using deterministic model (non-local), with the results of (a) first four moments and autocorrelation, and (b) invariant measures.

not reach consensus at a single value; this leads to a family of structural error model that all fit the data with similar accuracy. We surmise that this is caused by the fact that the indirect data contains only partial information about the invariant measure.

Nonetheless the fits are all far superior to that obtained with the local deterministic model.

2 0 2 4 6 8 10

Prediction 2

0 2 4 6 8 10

Truth

First moments Second moments Third moments Fourth moments Autocorrelation

(a) Data comparison

5 0 5 10

X 0.00

0.05 0.10 0.15 0.20 0.25 0.30

Probability density

Truth Prediction

(b) Invariant measure

Figure 6.7: Indirect training of the error model (𝑐=3) using stochastic model with additive noise, with the results of (a) first four moments and autocorrelation, and (b) invariant measures.

Dalam dokumen Incomplete Models and Noisy Data (Halaman 192-196)