• Tidak ada hasil yang ditemukan

Incomplete Models and Noisy Data

N/A
N/A
Protected

Academic year: 2023

Membagikan "Incomplete Models and Noisy Data"

Copied!
269
0
0

Teks penuh

Introduction

Background

From a Bayesian perspective, we can view (1) as our prior knowledge and (2) as our data—the two are held in balance, and uncertainty in one can be counterbalanced by confidence in the other to yield a similar posterior estimate. correct. . To test this hypothesis, we use the methodology proposed in Chapter 4 to learn specific feeding corrections to the minimal Bergman model using simulated data. Chapter 6 reviews and unifies a number of approaches to learning structural errors in dynamical systems models.

Contributions

Chapter 7 focuses on viewing protein dimerization networks in biology through the mathematical lens of function approximation theory. Chapter 3 is taken directly from [6] (submitted to Chaos in 2023), and focuses on the introduction of a simple stochastic differential equation (SDE) model for blood glucose dynamics.

Notation

šŗš‘ (mg/dl) represents the basal glucose (i.e. the average of the unforced process), š›¾ (1/min) is the decay rate for the exponential mean reversal, š›½(mg/(dl*U)) is a proportionality constant for the linear effect of insulin-based IV glucose removal, andšœŽ controls the variance of the oscillations described byš‘Š(š‘”). In this work, we only consider the autocorrelation of the same element in the vectorš‘„, i.e. š‘„š‘˜(š‘”)š‘„š‘˜(0).

Ensemble Kalman methods with constraints

Introduction

  • Overview
  • Literature Review
  • Our Contribution

Standard ensemble Kalman methods use a quadratic optimization problem that summarizes the relative strengths of confidence in model and data predictions; these optimization problems have clear analytical solutions. In the optimization setting, there are three main methodologies: the extended Kalman filter, the unscented Kalman filter, and the ensemble Kalman filter.

Data Assimilation

Sections 2.2 to 2.4 describe the ensemble Kalman (EnKF) methodology for state estimation, with and without constraints. One of the most popular ones, which we focus on, is the Ensemble Kalman Filter (EnKF) [128], which can be very beneficial due to its derivative-free nature.

Ensemble Kalman Filter

An equivalent formulation of the minimization problem is now given using a penalized Lagrangian approach to incorporate the property that the solution of the optimization problem lies in the domain of the empirical covariance. The perspective is particularly useful when additional constraints are imposed on the solution of the optimization problem.

Constrained Ensemble Kalman Filter

Assume that the constraint sets of (2.15) and (2.16) are not empty, then š‘£āˆ— exists and is unique and for allšœ– > 0,š‘£šœ– exists and is unique. To prove the existence and uniqueness of the solution of (2.15), notice that it can be reformulated as.

Blood glucose forecasting experiments

  • Type 2 diabetes self-monitoring data
  • Intensive care unit patient dataset
  • Ultradian model of glucose-insulin dynamics

The upper half shows the dynamics of the observed component (model solutions in blue; observations in yellow); the bottom half shows the expanded state space (both hidden components are shown in gray). The upper half shows the dynamics of the observed component (model solutions in blue; observations in yellow); the lower half shows the expanded state space (all 10 hidden components are shown in gray).

Figure 2.1 compares the overall distribution of updated state means over time when running EnKF with and without these state constraints
Figure 2.1 compares the overall distribution of updated state means over time when running EnKF with and without these state constraints

SDE-based reduced-order model of glucose-insulin dynamics

Introduction

  • Clinical settings
  • Advantages of a linear SDE model

Another possibility is that the additional expressivity of the nonlinear models is not of the right kind and thus does not offer much additional predictive advantage (despite clear mechanistic validity). Comparison of fitting data for T2DM and intensive care patients reveals interesting structural differences in their glucose regulation.

Literature Review

In Section 3.4, we construct the framework for formulating the parameter estimation problem and its solution. Section 3.6 presents the numerical results on parameter estimation and prediction, together with some uncertainty quantification (UQ) results, separately for T2DM and ICU settings.

Modeling framework

  • Model construction
  • Event-Time Model
  • T2DM
  • ICU

We use simple models for the meal function š‘š(š‘”) and the insulin delivery function š¼(š‘”) (defined in Section 3.3.2) that allow explicit solution of the continuous time model between events. Remember once again, šŗš‘ is the basal glucose value and

Parameter Estimation

  • Bayesian Formulation
  • Optimization
  • MCMC

Thus we obtain the probability of the data given the glucose time series and the parameters viz. For example, the distribution P(šœƒ|š‘¦) is the conditional distribution of the random model parameters,šœƒ given the data,š‘¦.

Methods, Datasets, and Experimental Design

  • T2DM
    • Model, Parameters, and Dataset
    • Forecasting
  • ICU
    • Model, Parameters, and Dataset
    • Parameter Estimation and Uncertainty Quantification 51
  • Evaluation Metrics

For each patient, the dataset consists of meal times, meal glucose, and BG measurements along with measurement times. Then we select the right end point of the test series, which will be the next GK measurement time.

Table 3.2: Information about the dataset that is used in the ICU setting, collected from three ICU patients who are not T2DM
Table 3.2: Information about the dataset that is used in the ICU setting, collected from three ICU patients who are not T2DM

Numerical Results

  • T2DM
    • Parameter Estimation
    • Forecasting
    • Comparison of Forecasting Accuracy with LDP
  • ICU
    • Parameter Estimation
    • Forecasting
    • Comparison of Forecasting Accuracy With The

Figure 3.5 also shows that the time evolution of the estimated parameters is realistic within the ICU context. Furthermore, due to the non-stationary and sparse nature of the ICU data, it is more difficult to accurately estimate some model parameters.

Figure 3.1: Parameter estimation and uncertainty quantification in the T2DM setting.
Figure 3.1: Parameter estimation and uncertainty quantification in the T2DM setting.

Conclusion

š‘…(šœƒ); Thus, the proposed methodology is an adjustment of the empirical risk minimization described in the previous section. In the discrete time frame, larger values ​​of šœ– can magnify the complexity of the mismatch ĪØ0(š‘„) āˆ’ĪØā€ (š‘„).

Table 3.7: Forecasting results obtained with optimization approach for the MSG model with the nutrition function given in (3.26)
Table 3.7: Forecasting results obtained with optimization approach for the MSG model with the nutrition function given in (3.26)

Machine learning of model error

Introduction

  • Background and Literature Review
    • Data-Driven Modeling of Dynamical Systems
    • Hybrid Mechanistic and Data-Driven Modeling
    • Non-Markovian Data-Driven Modeling
    • Noisy Observations and Data Assimilation
    • Applications of Data-Driven Modeling
  • Our Contributions

These methods are inherently tied to discrete-time representations of the data and, while highly successful in many applied contexts, are of less value when the goal of data-driven learning is to fit continuous-time models; this is a desirable modeling goal in many environments. We provide an overarching framework for learning model errors from (potentially noisy) data in dynamic system settings, studying both discrete- and continuous-time models, along with both memoryless (Markovian) and memory-dependent representations of the model error.

Dynamical Systems Setting

  • Continuous-Time
  • Discrete-Time
  • Partially Observed Systems (Continuous-Time)
  • Partially Observed Systems (Discrete-Time)

However, these models are often simplifications of the true system and can therefore be improved with data-driven approaches. Our goal is to use machine learning to find a Markov model where š‘„ is part of the state variable, using our nominal knowledge š‘“0 and observations of a trajectory {š‘„(š‘”)}š‘‡.

Statistical Learning for Ergodic Dynamical Systems

We can use these definitions to frame the problem of learning model errors in the language of statistical learning [416]. In Section 4.5.2 we return to studying excessive risk and generalization errors in the context of linear (in terms of parametric dependence) models for š‘šā€ , and under ergodicity assumptions about the data generation process.

Parameterization of the Loss Function

  • Continuous-Time Markovian Learning
  • Discrete-Time Markovian Learning
  • Continuous-Time Non-Markovian Learning
    • Optimization; Hard Constraint For Missing Dy-
    • Optimization; Weak Constraint For Missing Dy-
    • Optimization; Data Assimilation For Missing Dy-
  • Discrete-Time Non-Markovian Learning

We have presented a machine learning framework in the continuous-time Markovian setting, but it can be adopted in discrete-time and non-Markovian settings. The recent paper [157] proposes an interesting, and more computationally efficient, approach to model error learning in the presence of memory.

Underpinning Theory

  • Markovian Modeling with Random Feature Maps
    • Continuous-Time
    • Discrete-Time
  • Learning Theory for Markovian Models with Linear Hypoth-
  • Non-Markovian Modeling with Recurrent Neural Networks . 116
    • Linear Coupling

Finally, we note the dependence of the risk and generalization error bounds on the size of the model error. The proof is provided in Section 4.8.2; this is a direct consequence of the approximation power of š‘“1, š‘“2and the Gronwall Lemma.

Numerical Experiments

  • Summary of Findings from Numerical Experiments
  • Learning Markovian Model Errors from Noise-Free Data
    • Experimental Set-Up
    • Evaluation Criteria
    • Lorenz ’63 (L63)
    • Lorenz ’96 Multiscale (L96MS) System
  • Learning From Partial, Noisy Observations
    • Lorenz ’63
    • Lorenz ’96 Multiscale ( šœ€ = 2 āˆ’1 )
  • Initializing and Stabilizing the RNN

Autocorrelation: We compare the autocorrelation function (ACF) with respect to the invariant distribution of the true and learned models. We now discuss the implications of embedding the true dynamics of a higher dimensional system in the specific context of the embedded system (4.38).

Figure 4.1: Here we visualize an example of the function š‘š 1 in (4.41), which is obtained as a single random draw from a zero-mean Gaussian Process mapping R 3 → R 3
Figure 4.1: Here we visualize an example of the function š‘š 1 in (4.41), which is obtained as a single random draw from a zero-mean Gaussian Process mapping R 3 → R 3

Conclusions

The bottom figure shows the divergence between the numerically integrated model error termš‘š(š‘”) and the state-dependent termš‘šā€ ; this growing discrepancy is probably responsible for the eventual collapse of the 4-dimensional system. We prove universal approximation to continuous hybrid RNNs and demonstrate successful implementation of the methodology.

Figure 4.13: Here we show that the embedded 3āˆ’dimensional manifold of L63, within the 4 āˆ’ dimensional system given by (4.45), is unstable
Figure 4.13: Here we show that the embedded 3āˆ’dimensional manifold of L63, within the 4 āˆ’ dimensional system given by (4.45), is unstable

Supplementary Information

  • Proof of Excess Risk/Generalization Error Theorem
  • Proof of Continuous-Time ODE Approximation Theorem
  • Proof of Continuous-Time RNN Approximation Theorem
  • Random Feature Approximation
  • Derivation of Tikhonov-Regularized Linear Inverse Problem 153

Note that š¹ is šæāˆ’Lipschitz in maximum norm on šµ(0.2šœŒš‘‡), for some šæ associated with Lipschitz continuity š‘“0, š‘“1 and š‘“2. Note that š¹ šæāˆ’Lipschitz in the maximal norm, for some šæ is related to the Lipschitz continuity š‘“0, the approximation parameterization šœƒš›æ and the regularity of the nonlinear activation functionšœŽ.

Abstract

Introduction

Standard models, such as Dalla Man, Rizza and Cobelli [99], focus on the glycemic impact of carbohydrates in a meal: carbohydrates are broken down into glucose molecules and then absorbed into the blood. However, these models typically ignore other macronutrients, such as fat, fiber, and protein, which are known to contribute substantially to the amount and timing of glucose uptake into the blood.

Background on modelling glucose-insulin dynamics

However, this approach introduces additional hand-crafted parameters to interpret quantities that cannot be observed outside of a laboratory setting, such as gastric glucose concentration over time after a meal. Instead of deriving š‘¢šŗ from a complex model of the human body, this approach represents š‘¢šŗ directly using a parametric function fitted from the data.

Phenomenologically modelling the absorption rate

A simple heuristic choice is a square function š‘Žš‘–(š‘”) = š‘”š‘–1[0,š‘¤)(š‘” āˆ’ š‘”š‘–)/š‘¤ where š‘¤ is the width of the square as a free parameter and š‘”š‘– ∈ R is the amount of glucose produced from the meal. Having defined our parametric function, we now discuss how to learn the parameters in an environment that is realistic for settings outside the laboratory.

Experiments

Our neural model can get very close to the ground truthš‘¢šŗ, especially in the tails, as shown in Figure 5.1 (left). This results in significantly better predictions, and our neural model accurately tracks baseline true glucose values ​​and absorption rates, even extrapolating to periods much longer than what we saw during exercise.

Figure 5.1: Left: (Top) ā€œAbsorption templatesā€ used to generate š‘¢ šŗ . (Bottom) 5 Samples of ground truth and learned š‘¢ šŗ for meals from the test set
Figure 5.1: Left: (Top) ā€œAbsorption templatesā€ used to generate š‘¢ šŗ . (Bottom) 5 Samples of ground truth and learned š‘¢ šŗ for meals from the test set

Discussion

An additional complication when using direct data is ensuring the stability of the dynamical system with the calibrated error modelš›æ(Ā·). Given data at š½ different points š‘‹(š‘—) for š‘— =1,2, .., š½, the mean of the error model can be written as a linear combination of basis functions.

Learning About Structural Errors in

Introduction

The effect of unresolved scales is modeled by subgrid-scale models, such as the classic Smagorinski model [390] , which depends on empirical parameters P (e.g. the Smagorinski coefficient). Here we show different ways of constructing models for structural errors and show how we can learn about structural errors from direct or indirect data in the absence of derivatives of the dynamical system.

Figure 6.1: External and internal approaches to modeling structural errors in complex dynamical systems.
Figure 6.1: External and internal approaches to modeling structural errors in complex dynamical systems.

Calibrating Error Models

  • Data Availability
    • Direct Data
    • Indirect Data
  • Methods of Calibration
    • Gradient-based or Derivative-free Optimization
    • Enforcing Constraints

Using direct data{š‘‹ , š›æ(š‘‹)} for calibration leads to a regression problem that can be solved by standard methods for a given parameterization of the error model (e.g. least squares suitable for dictionary learning, gradient descent methods for neural networks). There are different types of constraints that can be enforced when calibrating an error model.

Constructing Error Models

  • Dictionary Learning
  • Gaussian Processes
  • Neural Networks
  • Stochastic Extension
  • Representing Non-local Effects

Taking the additive error modelš‘‹Ā¤ = š‘“(š‘‹) +š›æ(š‘‹;šœƒI) as an example, the function R corresponding to the energy conservation constraint can be written explicitly as. On the other hand, this approach can be too restrictive when the dictionary of basis functions {šœ™š‘—} is misspecified, resulting in an insufficiently expressive error model.

Lorenz 96 Systems as Illustrative Examples

  • Lorenz Models Considered
    • Multiscale Lorenz 96 Model
    • Single-scale Lorenz 96 Model
  • Numerical Results for Lorenz Models
    • Lorenz 96 Multi-scale Model
    • Lorenz 96 Single-scale Model

We train a fully connected neural network as an errorš›æ(Ā·) model using indirect data (the first to fourth moments of the state variableš‘„š‘˜ and the autocorrelation of š‘„š‘˜). As shown in Fig.6.11, the calibrated error model with energy conservation leads to a modeled system that fits the training data better and achieves good agreement with the mass invariant of the true system.

Figure 6.2: Direct training of the error model ( š‘ = 10) using neural network, with results of (a) the trained error model and (b) invariant measures.
Figure 6.2: Direct training of the error model ( š‘ = 10) using neural network, with results of (a) the trained error model and (b) invariant measures.

Human Glucose-Insulin Model as Illustrative Example

Without forcing energy savings from the error model, we can see in Figure 6.10b that the performance of the calibrated model is comparable to that of the calibrated system using direct data in Figure 6.9. On the other hand, we also performed the calibration based on indirect data and enforced the energy saving of the error model, as discussed in Section 6.2.2.2.

Figure 6.11: Results from trained error model ( š‘ = 10) for the single-scale Lorenz 96 system with energy conservation constraint, including (a) first four moments and autocorrelation, and (b) invariant measures.
Figure 6.11: Results from trained error model ( š‘ = 10) for the single-scale Lorenz 96 system with energy conservation constraint, including (a) first four moments and autocorrelation, and (b) invariant measures.

Conclusions

Ensemble Kalman inversion

Protein dimerization networks as function approximators: ex-

Introduction

Mathematical setting

  • Chemical reactions: general case

Chemical Reactions: Specific Case

  • Defining parametric input-output maps from protein dimer-

Evaluating network expressivity by fitting to a target library

  • Random Networks: Expressivity and Versatility

Distribution shifts due to enforced constraints

Example constraint-based particle update

Example percentage map of constraint violations

Evaluation of unconstrained EnKF for ICU glucose predictions

Evaluation of constrained EnKF for ICU glucose predictions

Nutrition function

T2DM parameter estimation and uncertainty quantification

Patient 1

T2DM patient comparison

ICU Patient 4 predictions

ICU parameter estimation and uncertainty quantification

ICU parameter estimation over time

ICU forecasting

ICU forecasting

ICU nutrition function

Example residual error

Hybrid model performance vs model error

Hybrid model performance vs trajectory length

Hybrid model performance vs model complexity

Hybrid model performance vs sample rate

Hybrid model performance for L96MS

Learnt residual model error for L96MS

Learning L63 from partial noisy observations (correct hidden dimension)137

Learning L96MS from partial noisy observations

Learning 3DVAR gain from partial noisy observations of L96MS

Replicating L96MS invariant measure from partial noisy observations 142

Instabilities of L63 trajecotires when embedded in 4 dimensions

T2DM Data Set

ICU Data Set

T2DM prediction uncertainties

T2DM prediction comparisons

ICU prediction uncertainties

ICU prediction comparisons

Model performance summary

Forecast RMSE computed over all possible 4 hour windows of the

Gambar

Figure 2.1: The distribution of mean state updates when running EnKF with and without inequality constraints
Figure 2.3: Percentage map of the constraint violations, where each lower-bound constraint is represented by a row
Figure 2.4: Oh caption, my caption.
Figure 2.5: This figure demonstrates how constraints on the insulin states improve forecast accuracy and robustness, compared with the unconstrained EnKF forecast results shown in Figure 2.4
+7

Referensi

Dokumen terkait

Therefore, generation of multi soliton pulses and logic code are performed using the microring resonator MRR system which is considered as waveguide based device, where the transmission