Simulation Results - for the award of the degree of

focusing on the efficient execution of serial and parallel versions of proposed methods. A well known heuristic like random walk [82] and its variant (hierarchical random walk (hRW)) [21], an efficient direct solver KLU [19] are evaluated for the comparison of results with our proposed metaheuristics. Again both random walk and an efficient iterative solver CG [20] are implemented on GPU platform for comparison of solutions with GPU versions of RFD and TSRW. Various experiments are carried out on different power distribution networks to showcase the effectiveness and performance of both TSRW and RFD. The benchmarks⁴ used for experimentation are produced using our in-house power grid generator without loss of any functionality. These benchmarks are already being used in literature [111] to validate performance of many power grid analyzers. The benchmarks for steady state analysis are generated in the SPICE format with a grid size ranging from 1 million to 25 million nodes. The metal resistances on these benchmarks are set to1Ωlike industrial designs. The distribution ofV_DD PADs are placed randomly across the benchmarks to have a close realization of industrial power grid designs. The potential across these PADs are set to1.8V and the current sinks are placed on each nodes reasonably with values set to0.01A except at the PADs.

Table 3.1: Comparison of computational time to perform steady state analysis of power distribution networks.

Nodes t_RW(s) t_GS(s) t_hRW(s) t_KLU(s) t_{RF D}(s) Speedup Speedup Speedup Speedup (t_RW/t_{RF D}) (t_GS/t_{RF D}) (t_hRW/t_{RF D}) (t_KLU/t_{RF D})

pgnw1M 18.84 14.527 9.01 16.607 7.77 2.42× 1.86× 1.15× 2.137×

pgnw4M 102.03 95.453 40.02 156.13 35.05 2.91× 2.72× 1.14× 4.453×

pgnw9M 378.03 269.95 97.23 518.58 69.02 5.47× 3.91× 1.40× 7.515×

pgnw16M 784.20 595.66 206.67 1372.8 140.13 5.59× 4.25× 1.47× 9.8× pgnw25M 1338.07 1090.99 325.45 3065.21 211.12 6.33× 5.16× 1.54× 14.5×

The proposed RFD metaheuristic is evaluated on several power distribution benchmarks to compare the solutions of steady state analysis with solutions obtained from other metaheuristics, such as random walk (RW) and hierarchical random walk (hRW). It can be observed from Table 3.1 that the speedup of6.33×and1.54×have been achieved over the solutions obtained from RW and hRW, respectively, for a power distribution network having 25 million nodes. It

3.5 Simulation Results

is to be noted that RW solver is one of the well known standard power grid analyzers reported in literature [82]. The performance of RW is slow under suitable general conditions on large power grid networks. Due to the back and forth movement of walk [82], the chances of falling into local traps is more as compared to RFD method. Therefore, the order of magnitude of a walk to complete firstn steps can be(logn)³ which is large as compared to RFD [82, 112].

To showcase effectiveness and efficiency of our proposed RFD metaheuristic in terms of runtime, we have compared the solutions obtained using RFD with solutions obtained from a standard iterative solver Gauss-Seidel (GS) and a direct solver KLU. As listed in Table 3.1, the speedup of5.16×and14.5×have been achieved over GS and KLU, respectively, while performing experiments on a power grid benchmark having 25 million nodes. It is observed that GS is slow in evaluating large power grid networks because of slow rate of convergence.

Similarly, for large size system matrices resulting from large power distribution network, the performance of KLU is more compute intensive as compared to RFD.

As large size of power distribution network places great demand on memory bandwidth and powerful computations, performing scientific calculations to estimate hotspots necessitate the use of high performance computing units, such as GPU. To exhibit efficient propagation of drops to perform realistic simulations on large power grid networks, TSRW and RFD are implemented on Nvidia GPU platform for accelerated convergence. Each GPU has a number of streaming multiprocessors which include many cores that can perform compute intensive tasks by executing threads in parallel. For large power grid networks, RFD is performed to take advantage of GPU programming model (CUDA) that allows to carry out memory transfers between CPU and GPU, and extensive computations (kernels) simultaneously. These kernels are launched to generate a number of threads organized in an array of thread blocks.

The number of threads in each block (represented byblockDim variable) is specified at the beginning before launching the kernel. For starting the execution of RFD on GPU, each drop is taken to be a thread in a block associated with unique global index. Each thread combines its thread indexthreadIdxand block indexblockIdxvalues to represent a data indexi, which

0 1 2

0 2 199 200 199 200 199 200

i=totalNode*(blockIdx.x)/ i=totalNode*(blockIdx.x)/

Block 1

gridDim.x + threadIdx.x gridDim.x + threadIdx.x gridDim.x + threadIdx.x

i=totalNode*(blockIdx.x)/

Block 0 Block totalNode−1

Figure 3.6:Block representation of a kernel.

is calculated as [20],

i = (blockIdx.x ∗blockDim.x) + threadIdx.x (3.32) In our implementation, we consider the parameter(blockIdx.x∗blockDim.x)as(blockIdx.x

∗totalN ode)/gridDim.xand the grid size is set to200for optimum performance as shown in Figure 3.6. During kernel execution of RFD and TSRW , both drops and walks are allowed to propagate across the power distribution network by executing threads of same block for faster convergence, respectively. However, no synchronization methods are employed among threads of each block during execution or after returning the control from GPU to CPU. Only simple kernels are implemented to improve the performances of TSRW and RFD on GPU, i.e., the speedup of67×and2.15×have been obtained over serial RW and RFD, respectively, while performing on a power distribution network having25million nodes.

It is observed from Figure 3.7(a) and Figure 3.7(b) that the speedup of5.75×and1.27× have been achieved over GPU implementation of RW using TSRW and RFD for a power distribution network having 25 million nodes, respectively. It is observed that TSRW has demonstrated much better performance on GPU as compared to RFD. One of the reasons for such performance of RFD on GPU is due to the use of simple kernels and, because of the absence of synchronization methods among threads of same blocks and the inefficient use of helper kernel to gather results after execution of threads on kernel performing calculations. In view of this, a significant milestone has not been achieved in terms of runtime while imple-

3.5 Simulation Results

pgnw1M pgnw4M pgnw9M pgnw16M pgnw25M Power distribution networks (in million) 0

10 20 30 40 50 60

Speedup of TSRW on GPU

RFD_serial RW_GPU CG_GPU

(a)

pgnw1M pgnw4M pgnw9M pgnw16M pgnw25M Power distribution network (in million)

0 2 4 6 8 10 12

Speedup of RFD on GPU

RFDserial RW_GPU CG_GPU

(b)

Figure 3.7: Speedup achieved by (a) TSRW and (b) RFD on GPU over other techniques while performing steady state analysis on different power distribution networks.

mentation of RFD on GPU. However, the performance of RFD on GPU can be improved by employing efficient helper kernels, which can reduce the reverse data transfer penalty from GPU to CPU. The performance of RFD on GPU can also be improved by making use of idle threads while the execution of current thread becomes stalled due to the delay in accessing memory. Further, similar to TSRW, the two-step strategy can also be employed using RFD metaheuristic to improve the performance on GPU. We have also compared the results of TSRW and RFD heuristics with the solutions obtained from CG implemented on GPU platform to showcase the runtime improvement. It can be observed from Figure 3.7(a) and Figure 3.7(b) that both TSRW and RFD on GPU have demonstrated speedup of58.14×and11.83×, respectively, over CG (on GPU) while analyzing a power distribution network having 25 mil-

pgnw1M pgnw4M pgnw9M pgnw16M pgnw25M Power distribution networks (in million)

0 0.5

1 1.5

2 2.5

3 3.5

4 4.5

Percentage error

RWserial RW_GPU RFDserial RFDGPU TSRW

Figure 3.8: Percentage error evaluated during steady state analysis for both serial RW, RW on GPU, serial RFD, TSRW on GPU and RFD on GPU.

lion nodes. One of the reasons for the enormous speedup of TSRW over other algorithms is due to the shorter walk lengths. In serial random walk method, every time an outer loop iterates, nodes having known potentials act as new home nodes, while in TSRW method, mul- tiple random walks take place simultaneously, thus resulting in more number of home nodes at each step. More home nodes at each iteration result in shorter walk lengths for unprocessed nodes, which in turn speeds up the computing performance in a non-deterministic way. A general trend of shorter walk length is also observed after each iteration during the execution of both RW and hRW when nodes having known potentials act as new home nodes, but this process takes place at a slower pace.

Moreover, the residual error (average error) is kept under a satisfactory level of 3% as shown in Figure 3.8 during implementation of RFD metaheuristic, which is found to be con- siderably less as compared to the performance of RW (both serial and GPU implementations) and the proposed TSRW metaheuristic. With increase in size of the power distribution networks, it becomes a challenging task to keep the error under this satisfactory level. However, RFD demonstrates superior performance in terms of accuracy of solutions and outperforms RW based metaheuristics (RW, RW_GPU and TSRW) due to an unidirectional movement of drops. This favors the movement of drops to escape from any local traps and reach destina- tion within reasonable computational time.

3.5 Simulation Results

Table 3.2: Comparison of computational time (on CPU) for transient analysis of power distribution networks.

Benchmark t_RW (s) t_GS(s) t_hRW (s) t_{RF D}(s) Speedup Speedup Speedup

(pgtnw#)¹ (t_RW/t_{RF D}) (t_GS/t_{RF D}) (t_hRW/t_{RF D})

pgtnw10K 3.52 6.17 3.12 2.67 1.31× 2.31× 1.16×

pgtnw20K 9.49 15.45 6.03 5.10 1.86× 3.02× 1.18×

pgtnw30K 16.91 23.17 10.67 8.85 1.91× 2.61× 1.21×

pgtnw40K 27.68 31.86 19.89 13.21 2.09× 2.41× 1.50×

pgtnw50K 36.94 39.02 24.23 19.56 1.88× 1.99× 1.23×

pgtnw100K 102.11 98.14 59.30 46.85 2.18× 2.10× 1.26×

pgtnw500K 923.78 881.67 352.03 287.38 3.21× 3.06× 1.22×

pgtnw1M 2103.30 - 859.20 730.80 2.87× - 1.18×

pgtnw4M 8865.02 - 4214.60 2895.00 3.05× - 1.45×

1 #denotes the number of nodes in the power distribution network.

Further, transient analysis has been performed on several RC-modeled power distribution benchmarks to demonstrate the effectiveness of RFD metaheuristic. The benchmarks for transient analysis are generated in the SPICE format with a grid size ranging from ten thousand to 4 million nodes. The metal resistances on these benchmarks are set to 1Ω and capaci- tances are set to0.1µF similar to industrial designs. V_DD PADs having potential set to1.8V are placed randomly across the benchmarks to have a close realization of industrial power grid designs. The current sinks are placed on each nodes reasonably with values set to0.01A except at the PADs. The CPU times are measured after completion of all timesteps during implementation. The results of different algorithms are compared and listed in Table 3.2. It is observed from Table 3.2 that the speedup of 3.21×, 3.06×and 1.22×have been obtained over RW, GS and hRW algorithms, respectively, while analyzing a power distribution network having500,000nodes (number of timesteps considered within a time period of1ms is500).

With further increase in size of power distribution networks, it is observed that the iterative solver GS fails to provide solutions for two benchmarks (pgtnw1M and pgtnw4M) and RFD demonstrates speedup of 3.05×and 1.45×over RW and hRW algorithms, respectively, while analyzing pgtnw4M benchmark. The average residual error is kept under a satisfactory level

of 3% (with a maximum error of10mV being observed while solving pgtnw4M benchmark) during experiments.

Dalam dokumen for the award of the degree of (Halaman 87-94)