Incomplete Models and Noisy Data

Introduction

Background

From a Bayesian perspective, we can view (1) as our prior knowledge and (2) as our data—the two are held in balance, and uncertainty in one can be counterbalanced by confidence in the other to yield a similar posterior estimate. correct. . To test this hypothesis, we use the methodology proposed in Chapter 4 to learn specific feeding corrections to the minimal Bergman model using simulated data. Chapter 6 reviews and unifies a number of approaches to learning structural errors in dynamical systems models.

Contributions

Chapter 7 focuses on viewing protein dimerization networks in biology through the mathematical lens of function approximation theory. Chapter 3 is taken directly from [6] (submitted to Chaos in 2023), and focuses on the introduction of a simple stochastic differential equation (SDE) model for blood glucose dynamics.

Notation

𝐺𝑏 (mg/dl) represents the basal glucose (i.e. the average of the unforced process), 𝛾 (1/min) is the decay rate for the exponential mean reversal, 𝛽(mg/(dl*U)) is a proportionality constant for the linear effect of insulin-based IV glucose removal, and𝜎 controls the variance of the oscillations described by𝑊(𝑡). In this work, we only consider the autocorrelation of the same element in the vector𝑥, i.e. 𝑥𝑘(𝑡)𝑥𝑘(0).

Ensemble Kalman methods with constraints

Introduction

Overview
Literature Review
Our Contribution

Standard ensemble Kalman methods use a quadratic optimization problem that summarizes the relative strengths of confidence in model and data predictions; these optimization problems have clear analytical solutions. In the optimization setting, there are three main methodologies: the extended Kalman filter, the unscented Kalman filter, and the ensemble Kalman filter.

Data Assimilation

Sections 2.2 to 2.4 describe the ensemble Kalman (EnKF) methodology for state estimation, with and without constraints. One of the most popular ones, which we focus on, is the Ensemble Kalman Filter (EnKF) [128], which can be very beneficial due to its derivative-free nature.

Ensemble Kalman Filter

An equivalent formulation of the minimization problem is now given using a penalized Lagrangian approach to incorporate the property that the solution of the optimization problem lies in the domain of the empirical covariance. The perspective is particularly useful when additional constraints are imposed on the solution of the optimization problem.

Constrained Ensemble Kalman Filter

Assume that the constraint sets of (2.15) and (2.16) are not empty, then 𝑣∗ exists and is unique and for all𝜖 > 0,𝑣𝜖 exists and is unique. To prove the existence and uniqueness of the solution of (2.15), notice that it can be reformulated as.

Blood glucose forecasting experiments

Type 2 diabetes self-monitoring data
Intensive care unit patient dataset
Ultradian model of glucose-insulin dynamics

The upper half shows the dynamics of the observed component (model solutions in blue; observations in yellow); the bottom half shows the expanded state space (both hidden components are shown in gray). The upper half shows the dynamics of the observed component (model solutions in blue; observations in yellow); the lower half shows the expanded state space (all 10 hidden components are shown in gray).

Figure 2.1 compares the overall distribution of updated state means over time when running EnKF with and without these state constraints

SDE-based reduced-order model of glucose-insulin dynamics

Introduction

Clinical settings
Advantages of a linear SDE model

Another possibility is that the additional expressivity of the nonlinear models is not of the right kind and thus does not offer much additional predictive advantage (despite clear mechanistic validity). Comparison of fitting data for T2DM and intensive care patients reveals interesting structural differences in their glucose regulation.

Literature Review

In Section 3.4, we construct the framework for formulating the parameter estimation problem and its solution. Section 3.6 presents the numerical results on parameter estimation and prediction, together with some uncertainty quantification (UQ) results, separately for T2DM and ICU settings.

Modeling framework

Model construction
Event-Time Model
T2DM
ICU

We use simple models for the meal function 𝑚(𝑡) and the insulin delivery function 𝐼(𝑡) (defined in Section 3.3.2) that allow explicit solution of the continuous time model between events. Remember once again, 𝐺𝑏 is the basal glucose value and

Parameter Estimation

Bayesian Formulation
Optimization
MCMC

Thus we obtain the probability of the data given the glucose time series and the parameters viz. For example, the distribution P(𝜃|𝑦) is the conditional distribution of the random model parameters,𝜃 given the data,𝑦.

Methods, Datasets, and Experimental Design

T2DM

Model, Parameters, and Dataset
Forecasting

ICU

Model, Parameters, and Dataset
Parameter Estimation and Uncertainty Quantification 51

Evaluation Metrics

For each patient, the dataset consists of meal times, meal glucose, and BG measurements along with measurement times. Then we select the right end point of the test series, which will be the next GK measurement time.

Table 3.2: Information about the dataset that is used in the ICU setting, collected from three ICU patients who are not T2DM

Numerical Results

T2DM

Parameter Estimation
Forecasting
Comparison of Forecasting Accuracy with LDP

ICU

Parameter Estimation
Forecasting
Comparison of Forecasting Accuracy With The

Figure 3.5 also shows that the time evolution of the estimated parameters is realistic within the ICU context. Furthermore, due to the non-stationary and sparse nature of the ICU data, it is more difficult to accurately estimate some model parameters.

Figure 3.1: Parameter estimation and uncertainty quantification in the T2DM setting.

Conclusion

𝑅(𝜃); Thus, the proposed methodology is an adjustment of the empirical risk minimization described in the previous section. In the discrete time frame, larger values of 𝜖 can magnify the complexity of the mismatch Ψ0(𝑥) −Ψ†(𝑥).

Table 3.7: Forecasting results obtained with optimization approach for the MSG model with the nutrition function given in (3.26)

Machine learning of model error

Introduction

Background and Literature Review

Data-Driven Modeling of Dynamical Systems
Hybrid Mechanistic and Data-Driven Modeling
Non-Markovian Data-Driven Modeling
Noisy Observations and Data Assimilation
Applications of Data-Driven Modeling

Our Contributions

These methods are inherently tied to discrete-time representations of the data and, while highly successful in many applied contexts, are of less value when the goal of data-driven learning is to fit continuous-time models; this is a desirable modeling goal in many environments. We provide an overarching framework for learning model errors from (potentially noisy) data in dynamic system settings, studying both discrete- and continuous-time models, along with both memoryless (Markovian) and memory-dependent representations of the model error.

Dynamical Systems Setting

Continuous-Time
Discrete-Time
Partially Observed Systems (Continuous-Time)
Partially Observed Systems (Discrete-Time)

However, these models are often simplifications of the true system and can therefore be improved with data-driven approaches. Our goal is to use machine learning to find a Markov model where 𝑥 is part of the state variable, using our nominal knowledge 𝑓0 and observations of a trajectory {𝑥(𝑡)}𝑇.

Statistical Learning for Ergodic Dynamical Systems

We can use these definitions to frame the problem of learning model errors in the language of statistical learning [416]. In Section 4.5.2 we return to studying excessive risk and generalization errors in the context of linear (in terms of parametric dependence) models for 𝑚†, and under ergodicity assumptions about the data generation process.

Parameterization of the Loss Function

Continuous-Time Markovian Learning
Discrete-Time Markovian Learning
Continuous-Time Non-Markovian Learning

Optimization; Hard Constraint For Missing Dy-
Optimization; Weak Constraint For Missing Dy-
Optimization; Data Assimilation For Missing Dy-

Discrete-Time Non-Markovian Learning

We have presented a machine learning framework in the continuous-time Markovian setting, but it can be adopted in discrete-time and non-Markovian settings. The recent paper [157] proposes an interesting, and more computationally efficient, approach to model error learning in the presence of memory.

Underpinning Theory

Markovian Modeling with Random Feature Maps

Continuous-Time
Discrete-Time

Learning Theory for Markovian Models with Linear Hypoth-
Non-Markovian Modeling with Recurrent Neural Networks . 116

Linear Coupling

Finally, we note the dependence of the risk and generalization error bounds on the size of the model error. The proof is provided in Section 4.8.2; this is a direct consequence of the approximation power of 𝑓1, 𝑓2and the Gronwall Lemma.

Numerical Experiments

Summary of Findings from Numerical Experiments
Learning Markovian Model Errors from Noise-Free Data

Experimental Set-Up
Evaluation Criteria
Lorenz ’63 (L63)
Lorenz ’96 Multiscale (L96MS) System

Learning From Partial, Noisy Observations

Lorenz ’63
Lorenz ’96 Multiscale ( 𝜀 = 2 −1 )

Initializing and Stabilizing the RNN

Autocorrelation: We compare the autocorrelation function (ACF) with respect to the invariant distribution of the true and learned models. We now discuss the implications of embedding the true dynamics of a higher dimensional system in the specific context of the embedded system (4.38).

Figure 4.1: Here we visualize an example of the function 𝑚 1 in (4.41), which is obtained as a single random draw from a zero-mean Gaussian Process mapping R 3 → R 3

Conclusions

The bottom figure shows the divergence between the numerically integrated model error term𝑚(𝑡) and the state-dependent term𝑚†; this growing discrepancy is probably responsible for the eventual collapse of the 4-dimensional system. We prove universal approximation to continuous hybrid RNNs and demonstrate successful implementation of the methodology.

Figure 4.13: Here we show that the embedded 3−dimensional manifold of L63, within the 4 − dimensional system given by (4.45), is unstable

Supplementary Information

Proof of Excess Risk/Generalization Error Theorem
Proof of Continuous-Time ODE Approximation Theorem
Proof of Continuous-Time RNN Approximation Theorem
Random Feature Approximation
Derivation of Tikhonov-Regularized Linear Inverse Problem 153

Note that 𝐹 is 𝐿−Lipschitz in maximum norm on 𝐵(0.2𝜌𝑇), for some 𝐿 associated with Lipschitz continuity 𝑓0, 𝑓1 and 𝑓2. Note that 𝐹 𝐿−Lipschitz in the maximal norm, for some 𝐿 is related to the Lipschitz continuity 𝑓0, the approximation parameterization 𝜃𝛿 and the regularity of the nonlinear activation function𝜎.

Abstract

Introduction

Standard models, such as Dalla Man, Rizza and Cobelli [99], focus on the glycemic impact of carbohydrates in a meal: carbohydrates are broken down into glucose molecules and then absorbed into the blood. However, these models typically ignore other macronutrients, such as fat, fiber, and protein, which are known to contribute substantially to the amount and timing of glucose uptake into the blood.

Background on modelling glucose-insulin dynamics

However, this approach introduces additional hand-crafted parameters to interpret quantities that cannot be observed outside of a laboratory setting, such as gastric glucose concentration over time after a meal. Instead of deriving 𝑢𝐺 from a complex model of the human body, this approach represents 𝑢𝐺 directly using a parametric function fitted from the data.

Phenomenologically modelling the absorption rate

A simple heuristic choice is a square function 𝑎𝑖(𝑡) = 𝑔𝑖1[0,𝑤)(𝑡 − 𝑡𝑖)/𝑤 where 𝑤 is the width of the square as a free parameter and 𝑔𝑖 ∈ R is the amount of glucose produced from the meal. Having defined our parametric function, we now discuss how to learn the parameters in an environment that is realistic for settings outside the laboratory.

Experiments

Our neural model can get very close to the ground truth𝑢𝐺, especially in the tails, as shown in Figure 5.1 (left). This results in significantly better predictions, and our neural model accurately tracks baseline true glucose values and absorption rates, even extrapolating to periods much longer than what we saw during exercise.

Figure 5.1: Left: (Top) “Absorption templates” used to generate 𝑢 𝐺 . (Bottom) 5 Samples of ground truth and learned 𝑢 𝐺 for meals from the test set

Discussion

An additional complication when using direct data is ensuring the stability of the dynamical system with the calibrated error model𝛿(·). Given data at 𝐽 different points 𝑋(𝑗) for 𝑗 =1,2, .., 𝐽, the mean of the error model can be written as a linear combination of basis functions.

Learning About Structural Errors in

Introduction

The effect of unresolved scales is modeled by subgrid-scale models, such as the classic Smagorinski model [390] , which depends on empirical parameters P (e.g. the Smagorinski coefficient). Here we show different ways of constructing models for structural errors and show how we can learn about structural errors from direct or indirect data in the absence of derivatives of the dynamical system.

Figure 6.1: External and internal approaches to modeling structural errors in complex dynamical systems.

Calibrating Error Models

Data Availability

Direct Data
Indirect Data

Methods of Calibration

Gradient-based or Derivative-free Optimization
Enforcing Constraints

Using direct data{𝑋 , 𝛿(𝑋)} for calibration leads to a regression problem that can be solved by standard methods for a given parameterization of the error model (e.g. least squares suitable for dictionary learning, gradient descent methods for neural networks). There are different types of constraints that can be enforced when calibrating an error model.

Constructing Error Models

Dictionary Learning
Gaussian Processes
Neural Networks
Stochastic Extension
Representing Non-local Effects

Taking the additive error model𝑋¤ = 𝑓(𝑋) +𝛿(𝑋;𝜃I) as an example, the function R corresponding to the energy conservation constraint can be written explicitly as. On the other hand, this approach can be too restrictive when the dictionary of basis functions {𝜙𝑗} is misspecified, resulting in an insufficiently expressive error model.

Lorenz 96 Systems as Illustrative Examples

Lorenz Models Considered

Multiscale Lorenz 96 Model
Single-scale Lorenz 96 Model

Numerical Results for Lorenz Models

Lorenz 96 Multi-scale Model
Lorenz 96 Single-scale Model

We train a fully connected neural network as an error𝛿(·) model using indirect data (the first to fourth moments of the state variable𝑥𝑘 and the autocorrelation of 𝑥𝑘). As shown in Fig.6.11, the calibrated error model with energy conservation leads to a modeled system that fits the training data better and achieves good agreement with the mass invariant of the true system.

Figure 6.2: Direct training of the error model ( 𝑐 = 10) using neural network, with results of (a) the trained error model and (b) invariant measures.

Human Glucose-Insulin Model as Illustrative Example

Without forcing energy savings from the error model, we can see in Figure 6.10b that the performance of the calibrated model is comparable to that of the calibrated system using direct data in Figure 6.9. On the other hand, we also performed the calibration based on indirect data and enforced the energy saving of the error model, as discussed in Section 6.2.2.2.