The technology of computer equipment has been developed over several decades, especially graphics processors, which not only deal with graphic works, but also calculate scientific problems. Many research groups have implemented parallel computation using the MPI method with multi-CPUs to solve difficult and complex problems. The multi-threaded GPU is one solution to the technological limitation and can make the calculation faster.
This paper discusses the comparison between the MPI method and the CUDA method, then tests the FDTD simulation using the MPI method and the CUDA method. This experiment enables the time-consuming PIC simulation due to the large number of particles. Since the GPU has many threads to deal with these particles, it is possible to reduce the simulation time by using the CUDA method.
Each component of the Gaussian wave in the two-dimensional was displayed using VisIt software.
Introduction
However, since there are a number of threads that can be computed in parallel simulation, this method is very useful. While the MPI method using multiple CPUs allows each CPU to calculate the work that is divided by the total number of CPUs, the CUDA method is when all threads calculate their work divided by the number of threads [8]. FDTD simulation is good for parallel computing because the domain consists of many discrete cells that can be divided by the number of CPUs or threads.
Electromagnetic calculation based on Maxwell's equation was addressed by FDTD simulation to solve complex and difficult problems that are difficult to undertake in the real world. CPU, Intel Core i7-4770 3.40 GHz, and graphics card, NVIDIA GEFORCE GTX Titan Black, were used in the simulation, Operating System was 64bit Linux CentOS6.4. The hdf5 format, developed at the National Center for Supercomputing Applications, was used to store electromagnetic field data.
FDTD theory
The boundaries of the domain can be a perfect conductor and the size of each cell must follow the equation (11). The distribution of the electromagnetic field data in the FDTD simulation is shown in Fig.1. The position of the electric field and the magnetic field are located in the domain side and face.
However, it is not important where they were located, but that all the components would allow each point to be crossed by each other. This schematization helps in understanding the position of each component data in the FDTD simulation. It is so important what the exact cell number is, including the position of the electric and magnetic field in the domain.
The magnetic field can be induced by the closed electric field, and the electric field can be generated by an induced magnetic field according to equations (5) ~ (10). To avoid deadlock, each field data must be fetched in turn to calculate each time. In this case, an error would occur since the electric field and the magnetic field are induced simultaneously.
Then, the procedure for calculating the electric and magnetic fields was implemented alternately.
Two-dimensional FDTD simulation
MPI method
The MPI method, which uses many computers that calculate work divided by the total number of computers, is similar to the parallel computing method. Suppose there are one hundred cells in the domain and ten computers (i.e., ten CPU cores) to compute the FDTD simulation, each computer only needs to compute ten cells of problems. This can save time and memory because each computer only needs to calculate one-tenth of the work.
Sometimes data transferred via LAN (Local Area Network) needs too much time due to the network traffic congestion. Therefore, users must take into account the number of computers they will use to calculate and the size of the data. The following experiment is a result of the calculation time from the different sizes of the domain.
Three different domain sizes were set in the two-dimensional FDTD simulation to measure the running time of the MPI method. In the case of type C, it took more than nine times the execution time of type A. In general, FDTD simulation with large domain sizes takes a long time to complete the calculation.
Execution time is proportional to the size of the domain, not the number of processors. However, researchers need to consider the number of processors when calculating the FDTD simulation using the MPI method. The MPI method with two processors is much more efficient than four or eight processors.
Therefore, it is efficient when the two-dimensional FDTD simulation is implemented using the MPI method with large amounts of data and reducing the impedance as much as possible.
CUDA method
Data transfer takes a lot of time to calculate in the simulation, it is very important to consider the amount of work and the number of total threads. Thus, to obtain a high simulation efficiency, it is a good method to reuse the data in the shared memory.[20] In this paper, shared memory was not used. One of the most important things in a CUDA implementation is to get the correct thread ID and allocate it to the correct job.
Since all threads have lifetime access to the global memory, they can read and write data from the global memory. 8 is a second row and second column of the block array and third row, third column in the block. 8 shows that the domain consists of 16 × 12 cells and the marked array position is an array[6][6] in the two-dimensional array.
In the two-dimensional FDTD simulation, in order to propagate the Gaussian beam in the domain, each cell must update its electric field and magnetic field each time. The magnetic field could be obtained in the same way as the electric field from Maxwell's equation (7). Listing 2 shows how to obtain the magnetic field in the two-dimensional FDTD simulation.
To obtain any z-component magnetic field data in the cell, the neighboring cell is required to be obtained. The only difference is that the computational position in the simulation for the electric field is one step behind the magnetic field. All variables on the GPU can be used in the same way as variables on the CPU.
When calculating the electromagnetic field using the CUDA method, the data on the graphics card is needed, and when saving the data to the storage device, the calculated data must be transferred to the memory on the main board and then saved.
Results
Likewise with two-dimensional FDTD simulation, the unique thread ID must also be obtained in three-dimensional FDTD simulation. 13 shows that the Gaussian beam propagating in the three-dimensional domain was displayed using the VisIt tool. In this section, we discuss particle motion in FDTD simulation using the CUDA method.
Particle motion in FDTD simulation is one of the best solutions to this problem. In this section, we discuss particle motion in FDTD simulation using the CUDA method. Some information about the particle can be saved to calculate the motion in the simulation. Listing 5 shows the information structure of a particle in a two-dimensional simulation.
The electromagnetic field in the cell causes the particles to move according to the Lorentz force, which are these equations [9-11]. There is one problem when using these equations in FDTD simulation due to the hopping algorithm. Thus, the data on the electric and magnetic fields in the cell can be applied to the equation to obtain the acceleration of the particle.
As the Gaussian beam is propagating, the particles were affected by the beam's electromagnetic field in the cells and moved. So far, the CUDA algorithm has been discussed that the efficiency of FDTD simulation in parallel computing has been compared with the MPI method. Furthermore, particle motion in two- and three-dimensional FDTD simulation using the CUDA method was presented.
In the simulation, there was nothing but particles and the Gaussian beam in the domain. The particle motion in the simulation presented the particles moved by the Gaussian beam reflected at the boundaries of the domain. The effectiveness of the CUDA method with hardware engineering can be considered in the FDTD simulation.
Three-dimensional FDTD simulation
Particle movement
A particle with mass and electric charge can move in the electromagnetic field through the Lorentz force. Recently, many researchers have studied an accelerator to prove scientific theory or find new materials in the world. When a particle is created in the simulation, the thread ID can be assigned to it one by one.
If there were a hundred particles in the simulation, the thread ID could be set in order from 0 to 99. Variables x, y are the position of the particle and ux, uy is relativistic velocity and prev_x, prev_y is the previous position of the particle. Then each particle in the domain needs the correct cell position where it is now to get its electromagnetic data in the cell to calculate the particle motion.
In global memory, however, fields consist of a one-dimensional array, so the index for a one-dimensional array must be converted using the xPos and yPos variables. In this case, the electric and magnetic field data cannot be used in the equations because both field data were not generated at the same time. The particle, in this part we used an electron, has a very small mass, so the acceleration can have a very large value in the electromagnetic field.
In the FDTD simulation, the domain consisted of a number of cells containing their own electromagnetic field data which affect the particle motion and direction. All particles were randomly placed in the domain and formed a band to be easily affected by the electromagnetic field. Source polarized in the y-direction, the Gaussian beam has a stronger electric field in the y-direction than the field in the other direction initially, which causes the particle to move further in the y-direction.
However, in this simulation we did not address this work, but only the particle motion using the CUDA method.
Future work
Conclusion