in Partial Fulfillment of the Requirements for the Degree of

I would like to thank my guide Dr. Raja Banerjee for giving me an opportunity to work with him. His guidance in my research work has been invaluable. I would also like to thank Sathi Rajesh Reddy for his kind support and encouragement throughout the work on this thesis. I would also like to thank Mr. Madhu Pandicheri for his timely support in the CAE lab.

The fluid dynamics aspect involves using the GPU-based Navier-Stokes solver to study turbulent flows. The Large Eddy Simulation (LES) turbulence model has been successfully implemented to analyze 2D incompressible flow. Recently, there has been great interest in the scientific computing community to use GPUs for general purpose computation (such as the numerical solution of PDEs) instead of graphics.

To explore the use of GPUs for CFD simulations, an incompressible Navier-Stokes solver for a GPU was developed. LES is known to be sensitive to inlet boundary conditions, the effect of different inlet boundary conditions was observed and summarized for a mixing layer problem.

Introduction

General Introduction

CFD using Graphics Processing Units

Multigrid Technique

A brief overview of turbulence

The RANS equation is derived by performing a time averaging of the Navier-Stokes equations after replacing each variable with the sum of the mean and fluctuating components of the variable. The similarity between molecular mixing and turbulent fluctuations forms the basis of the Boussinesq Eddy Viscosity Hypothesis. According to this hypothesis, the Reynolds stress tensor is linearly proportional to the average stress rate tensor which is expressed as. where µt is the dynamic eddy viscosity and k represents the specific turbulent kinetic energy of the uctuations and is given by. Once the Bousseinesq approach is applied, the main purpose of the turbulence model is to determine the turbulent viscosity µt. Based on the approach taken to model µt, turbulence models are classified as.

Models where the eddy viscosity is fully determined in terms of the local mean velocity and prescribed parameters are referred to as null equation, or algebraic models. Some of the popular null equation models are Cebeci-Smith[2] and the Baldwin-Lomax [1]. In general, zero equation model performs quite well for free shear flows. In these models, a transport equation is solved for turbulent kinetic energy. The turbulent length scale is then algebraically prescribed in one-equation models.

The two comparison models have served as the basis for most of the turbulence modeling research over the past twenty years. Therefore, these models are called complete, that is, they can be used to predict the properties of a given flow without prior knowledge of the turbulence structure.

Introduction to Large Eddy Simulation

Literature Survey

Objective of the present work

GPU Architecture

CUDA programming model

The data processed by the GPU is first transferred from the host memory to the device's global memory. Each block has shared memory that is visible to all threads of the block, and each thread has private local memory that is visible only to that thread. There are also two additional memory spaces accessible to all threads, persistent memory and texture memory.

Constant memory provides faster and more parallel data access paths for CUDA core execution than the global memory [9]. Texture memory is used to speed up frequently performed operations and also provides a way to interact with the display capabilities of the GPU.

Implementation in CUDA

CUDA devices have different types of memory that can be used by programmers to achieve high performance as illustrated in Figure 2.2 [8]. It requires convergence to a high degree (massive error) and consumes nearly 80% of the total computational time. A number of iterative linear solvers are available, but the chosen solver must suit the parallel nature of the GPU.

For an implicit algorithm, the variable at each point is calculated from its neighboring points at the same time step. For the algorithm to run on parallel threads, it is necessary that there are no dependencies between variables on different threads. For another ordered stencil used to solve the 2-D Pressure Poisson equation, two colors (eg red and black) are required to generate two sets of points that are independent of each other.

The thread ID and block ID provided by CUDA can be used to map a thread to a point of a color and each color is processed sequentially. The Figure 2.3 [9] shows colored domain and thread configuration for processing the points.

Figure 2.3: CUDA thread mapping arrangement

Multigrid technique

Solution Algorithm for Multigrid on GPU

The corrections to the fine grid points are now added to the intermediate solution to give a better initial estimate, and further refinement is done at each level. Finally, the extended correction is added to the intermediate solution on the finest grid and the convergence is checked after smoothing.

Pseudocode

Governing Equation

Numerical Methodology for Solving Navier-Stokes Equation

Solution Algorithm

Integral over a finite volume for the continuity equation Z. where Sf is the surface vector representing the area of the face ft of the cell, and uf is the velocity defined at the center of the face f. Based on conservation of mass, the discretized form of the continuity equation can be derived as where Ff is the external mass flux through the surface f that defines it. Where Ff =uf ·Sf is the convection current and uf is the value of u at the center of the surface.

It is found that the first-order headwind scheme unconditionally satisfies the bounds criteria and never yields an oscillatory solution, but it introduces excessive numerical diffusion. In this scheme, the values at the plane are mainly calculated based on the linear interpolation of the values of the adjacent points. In the first step, the pressure term is dropped and the equation is solved explicitly for a fictitious velocity field called mass velocities u∗ and v∗.

In the second step, the corrected pressure field is found by using the mass velocity obtained from the first step and by solving the pressure-Poisson equation. By enforcing the divergence-free condition on the velocity field, the pressure Poisson equation has been derived from the momentum equations according to Julien et al. [10]. Drop the pressure terms and solve explicitly for mass velocities u∗ and v∗ as in equations 3.18 and 3.19 given below.

The pressure poison equation is implicitly solved using Gauss Seidel on GPU (Red black) with multigrid.

Reynolds Averaged Navier-Stokes (RANS) Equa- tions

The term ρu0iu0j is known as the Reynolds stress tensor which consists of three normal stress components and six shear stress components which are not known a priori and must be modeled for the closure of the set of governing equations.

The k − model

The solution algorithm for RANS based turbu- lence models

The above equations 3.62 and 3.63 are solved using Gauss siedel and iteration until convergence to find updated values kn+1 and n+1.

Spectral analysis

Filtering

Smagorinsky-lilly model

Brief discussion on Smagorinsky’s constant

Inlet boundary conditions

Solution Algoritm for LES

Equations 4.8 and 4.9 are explicitly solved to find U∗ and V∗. Solve the pressure equation according to 3.28. Eddy viscosity is calculated based on the Smagorinsky-Lilly model. 4.16) Here Sij is calculated based on the corrected velocity field.

Laminar Code validation

The problem was solved on the GPU using the red-black Gauss seidel algorithm on a single grid.

Turbulence code validation

The turbulent kinetic energy k and the dissipation rate are calculated from turbulent intensity I and viscosity ratio.

Figure 5.5: V v/s X Red Black GS on multigrid

LES model

For the energy spectrum to fall beyond the sub-inertial range, the length scale for the LES scale ≈ 10× Kolmogorov according to Wilcox [16]. From the above equation, the Kolmogorov time scale is calculated to be of order 10−3. Due to the CFL restriction in the explicit scheme, the time step is taken as 10−4 for the present problem.

Figure 5.9: Inlet Profile for LES

Conclusion

In LES, the downstream flow was found to be sensitive to inlet conditions, so realistic fluctuating inlet boundary conditions are required. It was also found that the Smagorinski constant does not have a universal value and a high value results in the damping of large-scale oscillations. A suitable Cs value for the current 2D mixing layer is 0.08.

Future Scope

10] Thibault JC, Senocak I, "CUDA Implementation of a Navier-Stokes solver on Multi-GPU desktop platforms for Incompressible Flows", 47th AIAA Aerospace Sciences Meeting, Orlando, Florida (Jan 2009). Zang and Ugo Piomelli, "Application of the dynamic subgridscale model to axisymmetric transition boundary layer at high velocity", Phys. of Fluids. 20] Bert Vreman, Bernard Geurts, Hans Kuerten, "Large Eddy simulation of the turbulent mixing layer", J.

Celik, "Random Flow Generation Technique for Large Eddy Simulations and Particle-Dynamics Modeling", Transacties van het ASME-I-Journal of Fluids Engineering.