An FVM designed for PDEs with discontinuous solutions is referred to as a shock capture method. Instead, shock capture methods have traditionally been designed using human intuition to create algorithms that then perform well empirically.
INTRODUCTION
Motivation
This approach saves computational cost without necessarily invalidating the rest of the solution (LeVeque et al., 2002). Typically, high-order numerical methods are used to solve these problems as they provide the best asymptotic error bounds (Deville et al., 2002).
Historical Context and Literature Review .1 Shock Capturing Methods.1Shock Capturing Methods
- Overview of ML for PDEs and Fluid Mechanics
- ML for finite difference and similar methods
Xingjian et al., 2015) used a convolutional LSTM to predict short-term precipitation based on radar maps. Mohan et al., 2019) developed a deep learning framework called Compressed Convolutional LSTM to reduce the dimensionality of turbulence.
Summary of contributions and outline
- WENO-NN
- FiniteNet
- WENO-TL
They use the idea of matching the truncation error of the discretization scheme using nonlinear expressions. We also demonstrate that training over long periods of time improves the numerical stability of the learned scheme.
BACKGROUND
PDE background
- Hyperbolic vs. Elliptic PDEs
- Advection Equation
- Inviscid Burgers’ Equation
- Kuramoto-Sivashinsky Equation
- Euler Equations
Once the discretization takes place, no inference can be made about what happens between the grid points, as this is completely determined by the initial state and not by the dynamics of the equation. Chaos for PDEs is analogous to dynamical systems, defined by a small perturbation in the initial conditions that drastically affects the time evolution of the system (Strogatz, 2001).
Numerical methods background .1 The finite-difference method.1The finite-difference method
- The finite-volume method
- Example
- Time stepping methods
- Lax Equivalence Theorem
- Modern shock capturing methods
- Lax-Friedrichs Flux Splitting
In a linear PDE, this is achieved if the total variation of the numerical solution is bounded asΔ𝑡approaches 0. Shock registration methods are intended to accomplish the difficult task of simulating PDEs with discontinuous solutions without special treatment of the discontinuities.
Machine Learning background .1 Regression and Neural Networks.1Regression and Neural Networks
- Other network architectures
- Transfer Learning
0 represents a vector of neural network biases, and as mentioned above𝑔(.) is an activation function applied to each input of vectorΘ(𝑖)𝑦𝑖−1+Θ(𝑖). A convolutional neural network can be used to take advantage of spatially local structure problems. Mathematically speaking, the difference can be shown by considering a fully connected neural network to have the following input-output structure.
An RNN is a neural network that propagates information forward in time and accepts a time-dependent input type. For example, let's say someone needs to train a neural network to determine whether or not an image has a cat in it. Next, suppose they wanted to train another neural network to determine whether or not an image has a dog in it.
ENHANCEMENT OF SHOCK CAPTURING METHODS VIA MACHINE LEARNING
Introduction
Numerical Methods
- Description of WENO-NN
For example, applying the method to WENO5-JS results in a wind-biased finite volume method with coefficients that depend on the local solution. We use WENO5-JS to preprocess the input data so that the input to the neural network is the set of finite-volume coefficients found by WENO5-JS. After determining the nonlinear weights 𝑤𝑖 according to the WENO5-JS algorithm, the coefficients for each cell mean are calculated as .
We can use such a small network because we assume that the WENO5-JS coefficients are a useful model input, are a strong priority, so WENO5-JS performs a significant portion of the required processing. 𝐿2 regularization is applied to the output of the neural network to penalize deviations from WENO5-JS, which encourages the network to change WENO5-JS's answer only when a better result is expected. We also see that WENO5-JS converges at fifth order and WENO1 converges at first order as Δ𝑥 → 0, as expected.
WENO-NN WENO5-JS
- Other Numerical Methods Used
- Machine Learning Methodology
- Results
- Advection Equation
- Inviscid Burgers’ Equation
- Discussion and Conclusions
We then decided to let the neural network perturb the WENO5-JS coefficients, which we found significantly improved the performance of the method. One can see that the solution using WENO-NN provides a closer visual fit to the exact solution, as WENO5-JS spreads the discontinuity more significantly than WENO-NN. We see a similar behavior in the discontinuity width, where WENO-NN reaches its steady value relatively quickly, while WENO5-JS continues to spread.
On the very fine grid, both WENO5-JS and WENO-NN are similar (provided WENO-NN is stable, then it is constrained to converge at least first order). We then observe similar behavior in the Shu-Osher problem, where we see that WENO5-JS spreads the fine features of the solution more than WENO-NN. A disadvantage of WENO-NN is that it does not inherit the high-order convergence of WENO5-JS.
FINITENET: A FULLY CONVOLUTIONAL LSTM NETWORK ARCHITECTURE FOR TIME-DEPENDENT PARTIAL
DIFFERENTIAL EQUATIONS
- Introduction
- Methodologies
- Network Architecture
- Training Algorithm
- Accuracy Constraints
- Simulation Results .1 Summary.1Summary
- Linear Advection Equation
- Inviscid Burgers’ Equation
- Kuramoto-Sivashinsky Equation
- Comparison with other temporal modeling techniques
- Discussion
Within each evaluation of the LSTM, the information is used to calculate the solution at the next substep in a manner that mimics traditional FDM or FVM. This network outputs the hidden information to the next substep, as well as a prediction of the FVM or FDM coefficients. The loss is calculated by downsampling the exact solution to the neural network grid, averaging over the squared error at each point in time and space.
However, many details regarding the model and training process have been changed. We plot the total variation of the initial state and compare it with the error ratio in Figure 4.9. Therefore, we can conclude that trying to model the temporal behavior of the PDE improves the performance of the model.
APPLICATION OF TRANSFER LEARNING TO WENO SCHEMES
Introduction
WENO-TL Algorithm
- Model
- Training
First, we make no modifications to the WENO-NN architecture, and simply retrain the last layer of the neural network (WENO-TL4). The first simply applies linear regression from the cell means to the perturbation (WENO-TL5). We then use WENO-TL to run the same simulation on a coarse grid, calculate the 𝐿2 error of the WENO-TL solution, calculate the gradient of that error with respect to the neural network parameters being retrained, the update parameters with the ADAM optimizer (Kingma and Ba, 2014), and repeat until training is complete.
A flow diagram showing the steps of the simulation performed during the training process can be seen in figure 5.2, which outlines the steps that take the initial data to calculate the loss. Note that we again apply regularization to the model by penalizing perturbations to WENO5-JS. The entire simulation is implemented in PyTorch so that the computation graph follows each step of the computation rather than just tracing steps that explicitly involve the neural network.
Results
- Cases Examined
- Error reduction achieved by WENO-TL
- Density bump for Euler equations
- Random initial conditions for Euler equations
- Convergence Results
A summary of the statistics of the ratio of the error to WENO-JS,𝑒𝑟 for each network can be seen in table 5.1. We also present PMFs for each model to get a more precise understanding of the distribution of these error ratios. We also look at the performance of each model on the 2D Euler equations for a 2D analogue of our density bump test case.
We can look again at the PMFs of the error ratios generated with each model, shown in Figure 5.4. We also look at the PMFs of the error ratios for each model, again by constructing 1000 previously unseen initial conditions to gain a more detailed understanding of the out-of-sample performance of each model. Interestingly, we see that for the case where both the initial density and pressure are randomly generated, none of the hybrid models can always get a more accurate solution, although we see that they are more accurate in the vast majority of cases.
WENO-TL5 WENO-TL6
We see much slower convergence rates for WENO-TL5 and WENO-TL6, with the other algorithms converging at similar speeds. We also look at how quickly the simulation converges for each model for the 1D case with random initial conditions in Figure 5.8. For this case we use an initial condition that is representative of what we have seen in terms of accuracy for a typical case, solve the PDE on a very fine grid with WENO5-JS to obtain an 'exact' solution to get, and reduce by six different factors to see how error varies with discretization width.
WENO-TL4 WENO-TL5
Runtime Results
We can see that WENO-JS and WENO-TL7 are very similar in terms of runtime and error for this density test case for the PyTorch code. Additionally, in regions where this overlap occurs, the memory requirements for WENO-TL7 are much lower due to the coarser mesh. For the points closest to the overlap, WENO-TL7 uses a 5x coarser mesh, which of course corresponds to a 5x reduction in memory required.
As for the previous two run studies, we again use the same simulation as the convergence study to obtain the data. We see that learned transfer schemes struggle to compete with WENO-JS for. Interestingly, we see very different trends for this case than we saw for the density collision case.
CONCLUSION AND FUTURE WORK
Conclusions
We also show that using an LSTM leads to an error reduction relative to other temporal modeling techniques, all of which outperform a model that contains no temporal component. In Chapter 5, we apply transfer learning to WENO-NN to develop equation-specific shock-capturing methods and name the resulting algorithm WENO-TL. By using the parameters of WENO-NN as initial weights for the network in WENO-TL, the training results are more consistent, eliminating the need for aggressive early stopping that was required when training WENO-NN.
We then see that by training the network on equation-specific data, we can achieve greater error reduction than WENO-NN achieves. We also see a very promising generalization behavior of WENO-TL, as we can train the network on solutions of 1D Euler equations and see remarkably similar results in terms of error to the performance of WENO5-JS when testing the same networks on 2D Euler equations. However, we also see that if the space of initial conditions on which we try to train the model is too wide, the learned scheme does not perform as well.
Future Work
High-order methods are simply an inner product between a constant vector of coefficients and the local solution, and even advanced algorithms like WENO5-JS are not particularly expensive per iteration. Robustness is a very important characteristic for industrial CFD codes such as those developed by ANSYS, Inc. Robustness against practical problems such as high aspect ratio elements remains unknown for the approaches developed in this thesis.
It would also be interesting to investigate the effectiveness of these algorithms on simulations with more complicated physics, such as multiphase or turbulent flows. It seems likely that more training data would lead to better results on more general problems, such as the case of random initial states. The neural operator model learns how to solve the PDE directly, which could provide a very large speedup when solving many perturbations of the same problem, such as slightly different configurations of an actively controlled airfoil.
BIBLIOGRAPHY
Finite difference methods for ordinary and partial differential equations: stationary and time-dependent problems. Physics-Based Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations”.