• Tidak ada hasil yang ditemukan

4.8 Closure

5.1.1 Mesh numbering and on-the-fly DOF calculation

For performing FEA using only the solid elements, a list of functional DOFs needs to be prepared. For this purpose, we propose a parallel strategy that takes the mesh connectivity information and the vector of design variablesρi to output the list of functional DOFs. Removing DOFs from the structure comes with two key issues.

1. Removal of DOFs with boundary conditions: DOFs containing boundary conditions can become inactive and thereby, removed from the structure. This is a known issue in BESO that hinders convergence. Huang and Xie (2008) suggested checking for such breakage of boundary support at every iteration to alleviate the problem. However, this strategy becomes prohibitively expensive for large-scale structures on the GPU.

2. Locating DOFs with boundary condition from the reduced list:Another issues is locating DOFs containing boundary conditions on the GPU from the list of functional DOFs to implement the Dirichlet and Neumann boundary conditions. For example, if a load is applied at thenthDOF, thenth entry in the global load vector needs to be modified. However, if, instead of the entire global load vector, a shortened vector is used with only the functional DOFs, there is no easy way to determine the location of the boundary nodenfrom the shortened load vector stored in GPU global memory. This is not a problem for CPU implementations due to the sequential nature of the computation, whereas for a GPU implementation this creates a major issue. Although a search operation can be implemented to look for the target nodes at every iteration, such operations are considered highly unsuited to the GPU architecture for causing massive thread divergence and thus, prove highly detrimental to the performance of the application.

It is worth noting here that both the issues discussed above are only relevant to parallel GPU-based implementations of large-scale problems. For small-scale problems and for sequential CPU implemen- tations, these are not causes of major concern. In order to circumvent these issues, a new numbering scheme is used for the finite element mesh. The key idea behind the numbering scheme is to number the DOFs with boundary conditions at the beginning and then proceed to number the rest of the DOFs in the mesh. The scheme, illustrated in Figure 5.3 with the help of a cantilever beam and a L-beam prob- lem, is similar to the mesh reduction strategy discussed in Chapter 4. As illustrated in the figure, the Neumann and Dirichlet boundary DOFs are numbered first, followed by the rest of the mesh. The total number of boundary condition DOFs (nBC) is passed to the kernels and during calculation of functional DOFs, the DOFs containing boundary conditions are not considered. Since they are always numbered 1−nBC, for all problems and all boundary conditions, there is no need to locate the boundary DOFs and issue (2) is resolved immediately. Furthermore, since boundary DOFs (1−nBC) cannot be removed from the vector of functional DOFs [f DOF], there is no risk of removing boundary support and issue (1) becomes non-existent. Using the numbering scheme presented in Figure 5.3, [f DOF] is calculated according to the steps presented in Algo. 12. As shown in Figure 5.2, this step is parallelized with a combination of thrust calls and a custom CUDA kernel. At the start of Algo. 12, the vector [f DOF] is initialized using thrust and a raw pointer is extracted for use in the kernel. Following this, [f DOF] is filled with DOF numbers starting from 1 to the number of total DOFs in the mesh, nDOF. In line 4, CUDA kernel calcfDOFkernel is launched that checks the ρvalues of neighbor elements for every

P P

1 2 3

7 8

11

17

23 24

18 12 9

22 16 10

25

19 20

26 21

27 5

6

13 14 15 4

(I) (II)

1 7 8 9 10 11 6

2 12

18

24

30 31

25 19 13

5 4 3

32 26 20

14 15

21

27

33 16

22

28

34 17

23

29

35

Figure 5.3: Numbering scheme for hybrid BESO.

DOF. The non-functional DOF numbers are replaced with an arbitrary negative number in [f DOF].

A DOF is deemed non-functional if it does not have at least one solid element in its neighborhood.

Following the kernel call, thrust :: count if is used to count the positive numbers in [f DOF], which gives the number of functional DOFs in the mesh (nf DOF). Finally, the negative numbers are removed from [f DOF] using thrust :: remove if in line 6 to obtain the final list of functional DOFs for the optimization iteration. Steps performed incalcfDOFkernelare shown in Algo. 13. At the beginning, a

Algorithm 12:Calculating functional DOFs Input : [ρ]

Output:[f DOF],nf DOF

1 Initialize vector [f DOF] using Thrust;

2 Extract raw pointer for use in kernel;

3 Fill [f DOF] usingthrust::sequence;

4 calcfDOFkernel();

5 nf DOF =thrust::count if(f DOF.begin(), f DOF.end(), is positive());

6 thrust::remove if(f DOF.begin(), f DOF.end(), is negative());

unique thread idtxis set for every GPU thread. Line 2 of Algo. 13 indicates that 1−nBCDOF numbers are exempted from functionality checking of DOFs and thus, cannot be removed from [f DOF]. For the cantilever example in Figure 5.3,nDOF is 35 andnBC is 6. Therefore, only DOFs 7−35 go through the functionality check. Similarly for the L-beam example, only DOFs 7−27 go through the functionality check. Each thread performs the computation in parallel where the sum of densities of all neighboring elements is checked. A sum value less than 1 indicates a non-functional DOF. The corresponding DOF number is updated in [f DOF] to an arbitrary negative value as shown in lines 7−9. In the next section, GPU implementation for hard-kill FEA using [f DOF] array is discussed.

Algorithm 13:calcfDOFkernel Input : [ρ], nDOF, [f DOF], nBC

Output:Modified [f DOF]

1 tx←threadIdx.x + blockIdx.x × blockDim.x; // Assign a unique thread ID to every GPU thread

2 for thread IDtx←nBC+ 1 tonDOF do // on GPU

3 sum←0;

4 fori←1to nnedo // nne is the number

of neighbors

5 sum←sum+ρ[i]; // Sum of ρ for all neighbors

6 end

7 if sum <1then

8 f DOF[tx]← −1; // Arbitrary negative value

9 end

10 end