5.3 Proposed Energy Saving Policy
5.3.2 DiCeR with DAM at Multiple Granularities
Applying CMP-SVR in a cache bank increases its effective associativity, hence, more leakage saving can be achieved by shutting down some of the cache ways.
Therefore, we can also resort to the technique to way shutdown at the powered- on banks by using power gating circuitry [65]. Here again, the effect is that due to way shutdown the associativity of the set reduces thus increasing the conflict misses. But, performance aware way shutdown can help us to keep the IPC within a permissible (degradation) limit.
The proposed dynamic cache resizing saves more leakage energy of LLC (L2 cache in this case) without significant degradation in performance. It goes through three main phases:
• Bank shutdown: In the bank shutdown phase, least used cache banks are selected and turned off. The future requests to these banks are redirected to other powered on (target) banks. This phase of bank shutdown continues until either the performance degrades beyond the allowable threshold or if the number of banks turned off reaches a predefined maximum limit.
• Way shutdown: By bank level shutdown we are able to save on leakage energy. However, banks with average usages are not suitable candidates for complete power off. Instead, in such banks it is desirable to turn off some number of ways from every set.
Chapter 5. DiCeR with DAM 116
• Associativity management: After way shutdown the effective associativity of the bank reduces, thus increasing the number of conflict misses. In par- ticular, the sets which are in high demand will suffer from more misses and might lead to poorer performance. To mitigate these two issues, we apply CMP-SVR to dynamically increase the associativity of the sets. This allows better utilisation of the available bank capacity.
Algorithm 4 gives the details of the process. The application is run for certain number of cycles before deciding to reconfigure. This interval can be decided by profiling. The cache bank usages are collected by the controller, by attaching a counter in each bank for dynamic monitoring of the accesses, and the banks with minimum usages are decided to be shutdown. This process of bank shutdown is repeated on other banks until the system is able to maintain the performance.
In case the performance starts degrading beyond a predefined limit, the bank shutdown process is stopped. The process is also stopped in case the number of banks turned off reaches a maximum limit (which can be set by using profiling).
This limit also prevents application thrashing. After selecting the bank to turn-off (called as victim bank), its valid data blocks are transferred to a selected tar- get bank like BSP/DiCeR. The placement of redirected/forwarded data in target banks also follows the same mechanism like DiCeR, i.e. an implementation of Reuse Counter [18].
The target bank selection is done by choosing another active bank with average usage and which further has to be in close proximity (as per network hop distance) to the shutdown bank like BSP SVR. Distance cognizant target bank selection reduces network overhead of data migration and also the latency for subsequent requests forwarding from shutdown bank to the target bank as it was in BSP SVR.
The target bank is chosen with average usage because a bank with heavy usage will be unable to handle the additional load whereas a lightly used bank is a possible candidate to shutdown. Like DiCeR, the shutting down of the target banks is not permitted in order to avoid transitive redirection of requests. During block
Chapter 5. DiCeR with DAM 117
ALGORITHM 4: Algorithm for cache resizing
1: T : Reconfiguration interval
2: δ : Permissible percentage degradation in IPC
3: m : Maximum limit on bank shutdown
4: j = 0 : Number of banks turned off. Initially zero.
5: while (j < m) do
6: Run the application for T number of clock cycles.
7: Compute degradation in IPC compared to original average IPC.
8: if degradation is greater than δ then
9: do not shutdown bank.
10: break.
11: end if
12: CALL manageShutdown()
13: j++
14: end while
15: Run application with available number of cache banks for T number of clock cycles.
16: Call wayShutdown()
17: Keep current configuration until end of execution.
18:
19: Function : manageShutdown()
20: Calculate the usage for every active cache bank.
21: Select minimum used cache bank, Bi, as a victim bank.
22: Select another bank,Bj as the target bank, that is closest to Bi and has average usage.
23: if CMP-SVR is not activated on Bj then
24: activate CMP-SVR on Bj.
25: end if
26: Stall requests for Bi, but keep the response queue open.
27: Migrate all valid blocks fromBi to Bj. Remap all future request toBj.
28: Turn-off bank Bi.
29:
30: Function : wayShutdown()
31: Activate CMP-SVR in non-target powered on banks (if any).
32: Turn off some ways from the NT partition of the active banks.
33: Run the system for T number of Clock cycles.
34: if IPC degradation over last reconfiguration period is lesser than δ then
35: Turn off ways from RT partition of average usage powered on banks.
36: end if
37: Return.
Chapter 5. DiCeR with DAM 118
Figure 5.9: Way shutdown using CMP-SVR.
transfer only the accesses to the victim bank are stalled, and the other components can continue execution. The last step is to enable CMP-SVR on this target bank.
5.3.2.1 Effects of Way Shutdown on CMP-SVR
Once the limit to the turned off banks is reached: either due to performance degradation or due to maximum bank off limit, one can further save static power by attempting to turn off cache ways from the currently active cache banks. We choose to turn off ways from the NT-partition. This is done, because CMP-SVR helps to increase associativity using the RT partition of the sets. Thus even if the set has lesser ways in NT, a set in high demand can still fulfill its requests by spilling data in RT-partition of its friend sets. One can even become ambitious and shutdown ways from the RT-partition. The proposed policy attempts to even shutdown ways from RT-partition, provided it does not degrade performance beyond limits. The results are shown for both categories: NT-only shutdown and NT as well as RT shutdown. All dynamic reconfiguration overheads along with the remapping have been considered during simulation.
Figure 5.9 illustrates the way shutdown proposal for the sample cache. Here the cache is 8-way associative having 8 sets. There are two fellow groups with 4 sets each. Each set is divided into 4-ways for RT and 4-ways for NT. The basic set associativity is 8. Applying CMP-SVR with the help of fellow group of size 4, we get maximum associativity of (8 + 4×3 =) 20. If we shutdown 2 ways from NT
Chapter 5. DiCeR with DAM 119 Cache Parameters Values
Cache Level L2
Size of a L2 Bank 256 KB / 128 KB
Block Size 64 Bytes
Technology used 32nm
Associativity 4 / 8
Cache Model NUCA
Operating Temperature 360 K
Actual Cache Size 4 MB / 2 MB Table 5.1: CACTI Configurations
Components Parameters
No. of Tiles 16
Processor UltraSPARCIII+
L1 I/D Cache 64KB, 4-way
L2 Cache bank 128KB/256KB, 8-way
Memory bank 1GB, 4KB/page
Flit Size, Buffer Size 16 bytes, 4
Pipeline Stage 5-stage
VCs per Virtual Network 4
Number of Virtual Networks 5
Table 5.2: System and Network Parameters
portion, we are still able to maintain performance as the maximum associativity is now 18. Shutting down of 2 ways each from NT and RT results in maximum associativity of 10.