Introduction
Background
From a Bayesian perspective, we can view (1) as our prior knowledge and (2) as our dataāthe two are held in balance, and uncertainty in one can be counterbalanced by confidence in the other to yield a similar posterior estimate. correct. . To test this hypothesis, we use the methodology proposed in Chapter 4 to learn specific feeding corrections to the minimal Bergman model using simulated data. Chapter 6 reviews and unifies a number of approaches to learning structural errors in dynamical systems models.
Contributions
Chapter 7 focuses on viewing protein dimerization networks in biology through the mathematical lens of function approximation theory. Chapter 3 is taken directly from [6] (submitted to Chaos in 2023), and focuses on the introduction of a simple stochastic differential equation (SDE) model for blood glucose dynamics.
Notation
šŗš (mg/dl) represents the basal glucose (i.e. the average of the unforced process), š¾ (1/min) is the decay rate for the exponential mean reversal, š½(mg/(dl*U)) is a proportionality constant for the linear effect of insulin-based IV glucose removal, andš controls the variance of the oscillations described byš(š”). In this work, we only consider the autocorrelation of the same element in the vectorš„, i.e. š„š(š”)š„š(0).
Ensemble Kalman methods with constraints
Introduction
- Overview
- Literature Review
- Our Contribution
Standard ensemble Kalman methods use a quadratic optimization problem that summarizes the relative strengths of confidence in model and data predictions; these optimization problems have clear analytical solutions. In the optimization setting, there are three main methodologies: the extended Kalman filter, the unscented Kalman filter, and the ensemble Kalman filter.
Data Assimilation
Sections 2.2 to 2.4 describe the ensemble Kalman (EnKF) methodology for state estimation, with and without constraints. One of the most popular ones, which we focus on, is the Ensemble Kalman Filter (EnKF) [128], which can be very beneficial due to its derivative-free nature.
Ensemble Kalman Filter
An equivalent formulation of the minimization problem is now given using a penalized Lagrangian approach to incorporate the property that the solution of the optimization problem lies in the domain of the empirical covariance. The perspective is particularly useful when additional constraints are imposed on the solution of the optimization problem.
Constrained Ensemble Kalman Filter
Assume that the constraint sets of (2.15) and (2.16) are not empty, then š£ā exists and is unique and for allš > 0,š£š exists and is unique. To prove the existence and uniqueness of the solution of (2.15), notice that it can be reformulated as.
Blood glucose forecasting experiments
- Type 2 diabetes self-monitoring data
- Intensive care unit patient dataset
- Ultradian model of glucose-insulin dynamics
The upper half shows the dynamics of the observed component (model solutions in blue; observations in yellow); the bottom half shows the expanded state space (both hidden components are shown in gray). The upper half shows the dynamics of the observed component (model solutions in blue; observations in yellow); the lower half shows the expanded state space (all 10 hidden components are shown in gray).
SDE-based reduced-order model of glucose-insulin dynamics
Introduction
- Clinical settings
- Advantages of a linear SDE model
Another possibility is that the additional expressivity of the nonlinear models is not of the right kind and thus does not offer much additional predictive advantage (despite clear mechanistic validity). Comparison of fitting data for T2DM and intensive care patients reveals interesting structural differences in their glucose regulation.
Literature Review
In Section 3.4, we construct the framework for formulating the parameter estimation problem and its solution. Section 3.6 presents the numerical results on parameter estimation and prediction, together with some uncertainty quantification (UQ) results, separately for T2DM and ICU settings.
Modeling framework
- Model construction
- Event-Time Model
- T2DM
- ICU
We use simple models for the meal function š(š”) and the insulin delivery function š¼(š”) (defined in Section 3.3.2) that allow explicit solution of the continuous time model between events. Remember once again, šŗš is the basal glucose value and
Parameter Estimation
- Bayesian Formulation
- Optimization
- MCMC
Thus we obtain the probability of the data given the glucose time series and the parameters viz. For example, the distribution P(š|š¦) is the conditional distribution of the random model parameters,š given the data,š¦.
Methods, Datasets, and Experimental Design
- T2DM
- Model, Parameters, and Dataset
- Forecasting
- ICU
- Model, Parameters, and Dataset
- Parameter Estimation and Uncertainty Quantification 51
- Evaluation Metrics
For each patient, the dataset consists of meal times, meal glucose, and BG measurements along with measurement times. Then we select the right end point of the test series, which will be the next GK measurement time.
Numerical Results
- T2DM
- Parameter Estimation
- Forecasting
- Comparison of Forecasting Accuracy with LDP
- ICU
- Parameter Estimation
- Forecasting
- Comparison of Forecasting Accuracy With The
Figure 3.5 also shows that the time evolution of the estimated parameters is realistic within the ICU context. Furthermore, due to the non-stationary and sparse nature of the ICU data, it is more difficult to accurately estimate some model parameters.
Conclusion
š (š); Thus, the proposed methodology is an adjustment of the empirical risk minimization described in the previous section. In the discrete time frame, larger values āāof š can magnify the complexity of the mismatch ĪØ0(š„) āĪØā (š„).
Machine learning of model error
Introduction
- Background and Literature Review
- Data-Driven Modeling of Dynamical Systems
- Hybrid Mechanistic and Data-Driven Modeling
- Non-Markovian Data-Driven Modeling
- Noisy Observations and Data Assimilation
- Applications of Data-Driven Modeling
- Our Contributions
These methods are inherently tied to discrete-time representations of the data and, while highly successful in many applied contexts, are of less value when the goal of data-driven learning is to fit continuous-time models; this is a desirable modeling goal in many environments. We provide an overarching framework for learning model errors from (potentially noisy) data in dynamic system settings, studying both discrete- and continuous-time models, along with both memoryless (Markovian) and memory-dependent representations of the model error.
Dynamical Systems Setting
- Continuous-Time
- Discrete-Time
- Partially Observed Systems (Continuous-Time)
- Partially Observed Systems (Discrete-Time)
However, these models are often simplifications of the true system and can therefore be improved with data-driven approaches. Our goal is to use machine learning to find a Markov model where š„ is part of the state variable, using our nominal knowledge š0 and observations of a trajectory {š„(š”)}š.
Statistical Learning for Ergodic Dynamical Systems
We can use these definitions to frame the problem of learning model errors in the language of statistical learning [416]. In Section 4.5.2 we return to studying excessive risk and generalization errors in the context of linear (in terms of parametric dependence) models for šā , and under ergodicity assumptions about the data generation process.
Parameterization of the Loss Function
- Continuous-Time Markovian Learning
- Discrete-Time Markovian Learning
- Continuous-Time Non-Markovian Learning
- Optimization; Hard Constraint For Missing Dy-
- Optimization; Weak Constraint For Missing Dy-
- Optimization; Data Assimilation For Missing Dy-
- Discrete-Time Non-Markovian Learning
We have presented a machine learning framework in the continuous-time Markovian setting, but it can be adopted in discrete-time and non-Markovian settings. The recent paper [157] proposes an interesting, and more computationally efficient, approach to model error learning in the presence of memory.
Underpinning Theory
- Markovian Modeling with Random Feature Maps
- Continuous-Time
- Discrete-Time
- Learning Theory for Markovian Models with Linear Hypoth-
- Non-Markovian Modeling with Recurrent Neural Networks . 116
- Linear Coupling
Finally, we note the dependence of the risk and generalization error bounds on the size of the model error. The proof is provided in Section 4.8.2; this is a direct consequence of the approximation power of š1, š2and the Gronwall Lemma.
Numerical Experiments
- Summary of Findings from Numerical Experiments
- Learning Markovian Model Errors from Noise-Free Data
- Experimental Set-Up
- Evaluation Criteria
- Lorenz ā63 (L63)
- Lorenz ā96 Multiscale (L96MS) System
- Learning From Partial, Noisy Observations
- Lorenz ā63
- Lorenz ā96 Multiscale ( š = 2 ā1 )
- Initializing and Stabilizing the RNN
Autocorrelation: We compare the autocorrelation function (ACF) with respect to the invariant distribution of the true and learned models. We now discuss the implications of embedding the true dynamics of a higher dimensional system in the specific context of the embedded system (4.38).
Conclusions
The bottom figure shows the divergence between the numerically integrated model error termš(š”) and the state-dependent termšā ; this growing discrepancy is probably responsible for the eventual collapse of the 4-dimensional system. We prove universal approximation to continuous hybrid RNNs and demonstrate successful implementation of the methodology.
Supplementary Information
- Proof of Excess Risk/Generalization Error Theorem
- Proof of Continuous-Time ODE Approximation Theorem
- Proof of Continuous-Time RNN Approximation Theorem
- Random Feature Approximation
- Derivation of Tikhonov-Regularized Linear Inverse Problem 153
Note that š¹ is šæāLipschitz in maximum norm on šµ(0.2šš), for some šæ associated with Lipschitz continuity š0, š1 and š2. Note that š¹ šæāLipschitz in the maximal norm, for some šæ is related to the Lipschitz continuity š0, the approximation parameterization ššæ and the regularity of the nonlinear activation functionš.
Abstract
Introduction
Standard models, such as Dalla Man, Rizza and Cobelli [99], focus on the glycemic impact of carbohydrates in a meal: carbohydrates are broken down into glucose molecules and then absorbed into the blood. However, these models typically ignore other macronutrients, such as fat, fiber, and protein, which are known to contribute substantially to the amount and timing of glucose uptake into the blood.
Background on modelling glucose-insulin dynamics
However, this approach introduces additional hand-crafted parameters to interpret quantities that cannot be observed outside of a laboratory setting, such as gastric glucose concentration over time after a meal. Instead of deriving š¢šŗ from a complex model of the human body, this approach represents š¢šŗ directly using a parametric function fitted from the data.
Phenomenologically modelling the absorption rate
A simple heuristic choice is a square function šš(š”) = šš1[0,š¤)(š” ā š”š)/š¤ where š¤ is the width of the square as a free parameter and šš ā R is the amount of glucose produced from the meal. Having defined our parametric function, we now discuss how to learn the parameters in an environment that is realistic for settings outside the laboratory.
Experiments
Our neural model can get very close to the ground truthš¢šŗ, especially in the tails, as shown in Figure 5.1 (left). This results in significantly better predictions, and our neural model accurately tracks baseline true glucose values āāand absorption rates, even extrapolating to periods much longer than what we saw during exercise.
Discussion
An additional complication when using direct data is ensuring the stability of the dynamical system with the calibrated error modelšæ(Ā·). Given data at š½ different points š(š) for š =1,2, .., š½, the mean of the error model can be written as a linear combination of basis functions.
Learning About Structural Errors in
Introduction
The effect of unresolved scales is modeled by subgrid-scale models, such as the classic Smagorinski model [390] , which depends on empirical parameters P (e.g. the Smagorinski coefficient). Here we show different ways of constructing models for structural errors and show how we can learn about structural errors from direct or indirect data in the absence of derivatives of the dynamical system.
Calibrating Error Models
- Data Availability
- Direct Data
- Indirect Data
- Methods of Calibration
- Gradient-based or Derivative-free Optimization
- Enforcing Constraints
Using direct data{š , šæ(š)} for calibration leads to a regression problem that can be solved by standard methods for a given parameterization of the error model (e.g. least squares suitable for dictionary learning, gradient descent methods for neural networks). There are different types of constraints that can be enforced when calibrating an error model.
Constructing Error Models
- Dictionary Learning
- Gaussian Processes
- Neural Networks
- Stochastic Extension
- Representing Non-local Effects
Taking the additive error modelšĀ¤ = š(š) +šæ(š;šI) as an example, the function R corresponding to the energy conservation constraint can be written explicitly as. On the other hand, this approach can be too restrictive when the dictionary of basis functions {šš} is misspecified, resulting in an insufficiently expressive error model.
Lorenz 96 Systems as Illustrative Examples
- Lorenz Models Considered
- Multiscale Lorenz 96 Model
- Single-scale Lorenz 96 Model
- Numerical Results for Lorenz Models
- Lorenz 96 Multi-scale Model
- Lorenz 96 Single-scale Model
We train a fully connected neural network as an erroršæ(Ā·) model using indirect data (the first to fourth moments of the state variableš„š and the autocorrelation of š„š). As shown in Fig.6.11, the calibrated error model with energy conservation leads to a modeled system that fits the training data better and achieves good agreement with the mass invariant of the true system.
Human Glucose-Insulin Model as Illustrative Example
Without forcing energy savings from the error model, we can see in Figure 6.10b that the performance of the calibrated model is comparable to that of the calibrated system using direct data in Figure 6.9. On the other hand, we also performed the calibration based on indirect data and enforced the energy saving of the error model, as discussed in Section 6.2.2.2.
Conclusions
Ensemble Kalman inversion
Protein dimerization networks as function approximators: ex-
Introduction
Mathematical setting
- Chemical reactions: general case
Chemical Reactions: Specific Case
- Defining parametric input-output maps from protein dimer-
Evaluating network expressivity by fitting to a target library
- Random Networks: Expressivity and Versatility
Distribution shifts due to enforced constraints
Example constraint-based particle update
Example percentage map of constraint violations
Evaluation of unconstrained EnKF for ICU glucose predictions
Evaluation of constrained EnKF for ICU glucose predictions
Nutrition function
T2DM parameter estimation and uncertainty quantification
Patient 1
T2DM patient comparison
ICU Patient 4 predictions
ICU parameter estimation and uncertainty quantification
ICU parameter estimation over time
ICU forecasting
ICU forecasting
ICU nutrition function
Example residual error
Hybrid model performance vs model error
Hybrid model performance vs trajectory length
Hybrid model performance vs model complexity
Hybrid model performance vs sample rate
Hybrid model performance for L96MS
Learnt residual model error for L96MS
Learning L63 from partial noisy observations (correct hidden dimension)137
Learning L96MS from partial noisy observations
Learning 3DVAR gain from partial noisy observations of L96MS
Replicating L96MS invariant measure from partial noisy observations 142
Instabilities of L63 trajecotires when embedded in 4 dimensions
T2DM Data Set
ICU Data Set
T2DM prediction uncertainties
T2DM prediction comparisons
ICU prediction uncertainties
ICU prediction comparisons
Model performance summary
Forecast RMSE computed over all possible 4 hour windows of the