This thesis considers a modern CMP having 16 cores UltraSPARC III model with a TCMP and a CCMP architecture, as discussed earlier. The LLC in the whole thesis is implying on-chip L2 unless otherwise specified. The major contributions of this thesis can be summarised as follows-
1.6.1 DiCeR at Cache Bank Level
R1 at DCH : “More food is bad for health”
In our first leakage saving technique, we initially reduce cache-size by only shutting down cache banks till an allowable degradation in the performance. This technique is referred to as BSP. However, this policy cannot provide adequate cache space to the process in case it needs more cache space in future execution. Hence, we further propose a dynamic cache tuning technique which considers performance and locality of reference as system-wide constraints to manage the cache size in our baseline architecture, shown in Figure 1.2(a). In order to save leakage power, based on usage statistics, L2 cache banks are shutdown at runtime and its future accesses are remapped to other active L2 cache banks, called as target banks.
Chapter 1. Introduction 18 Additionally, this policy also takes care of the sudden increase in the application’s WSS during execution by allowing dynamic restarting of the powered off cache banks. System performance is monitored periodically and accordingly L2 bank(s) will be restarted if performance degradation is more than a threshold value. During turning on process, all the remapped contents are brought back to its home bank from its remapped location. The results are compared with BSP and Drowsy Cache [1], an existing policy. In particular, the following variations are proposed : (1) B ON OFF ALL.The system can decide to shutdown or turn-on some cache banks periodically throughout the process execution. (2) B OFF ONCE. The system allows to shutdown banks initially and once the bank restarting initiates, no more shutdown is permitted further. (3) B ON OFF OPT. This policy resizes cache like first policy, with some predefined time slices in which cache cannot be resized.
Out of these three policies, both first and third ones save 65% leakage energy on an average which is higher than the second one, where leakage saving is around 46%. But, the unrestricted cache resizing inB ON OFF ALLshows lesser EDP gain thanB ON OFF OPT. The average EDP gains for these three policies are 29%, 27% and 30%, respectively.
1.6.2 DiCeR in combination with DAM technique
R2 at DCH : “Better organ movements with restricted food supply”
In BSP, the bank shutdown process saves static power but reduces the perfor- mance of LLC. Due to multiple banks being shutdown the target banks may also get overloaded. Additionally, the request forwarding increases the on-chip traffic.
Therefore, to improve the performance of the target banks we use a dynamic asso- ciativity management (DAM) technique called CMP-SVR [34]. Furthermore, the
Chapter 1. Introduction 19 cost of request forwarding is optimized by considering network distance as an addi- tional metric for target selection. These two strategies help to reduce performance degradation.
While using DAM method on BSP, we can further try to reduce leakage energy by turning-off cache ways and apply associativity management. This policy goes through the following phases :
• Bank shutdown: Least used cache banks are turned off until either the performance degrades beyond the allowable threshold or if the number of banks turned off reaches a predefined maximum limit.
• Way shutdown: It is desirable to turn off some number of ways from every set, when a bank is not suitable for complete power off. DAM is incorporated here to enhance performance after way shutdown.
This policy saves 70% of leakage energy with negligible degradation in performance and outperforms some prior works. The average EDP gains for this policy is around 35%.
1.6.3 DiCeR for temperature control in TCMP
R3 at DCH : “Recess for the exhausted and no food to the idle”
Most DTM techniques apply DVFS or task migration to reduce core temperature, as cores are considered as the hottest on-chip components. On the other hand, modern large on-chip Last Level Caches (LLCs) are significant contributors to on- chip leakage power consumption and occupy the largest on-chip area. As power reduction plays the pivotal role for temperature reduction, hence, DiCeR not only can reduce leakage power consumption, furthermore, it can create on-chip thermal buffers for reducing average as well as peak temperature of the chip without dis- turbing the computation. In order to use cache to control temperature, the cache
Chapter 1. Introduction 20 resizing decisions in a TCMP (ref. Figure 1.2(a)) will be taken based upon the generated cache hotspots and/or the access patterns, during the process execution.
The major contributions of this work can be summarised as follows:
1. Hot Bank [HB] As leakage power increases quadratically with the tem- perature, it is hence better to turn off a heavily accessed “hot” bank and distribute its blocks to a colder target bank to reduce the cache hotspot as well as the leakage.
2. Cold Bank [CB] As leakage increases quadratically with the temperature and forms a circular dependency on the temperature, so, it is beneficial to shutdown least accessed and comparatively colder banks to make their power consumption zero. The small number of their remapped requests will not drastically affect the targets.
3. Both of these policies are compared with a greedy DVFS technique [2] which employs per core DVFS upon threshold temperature violation.
Through this thermal efficient cache resizing, we are able to reduce average chip temperature by 4◦C, at most. The maximum leakage saving achieved through this policy is more than 40% with slight degradation in performance. This leakage savings with slight change in performance offer an average EDP gains of 21% and 29% for HB and CB, respectively.
1.6.4 DiCeR for temperature control in CCMP
R4 at DCH : “Don’t disturb your peers”
This contribution analyses the role of a centralised multi-banked SNUCA LLC in thermal management while maintaining system performance. We dynamically
Chapter 1. Introduction 21 resize LLC to optimally balance the performance and chip temperature by offer- ing two levels of thermal management-(i) controlling cache temperature, and (ii) reducing temperature of the global hotspots by governing conductive heat transfer.
The major contributions of this work can be summarised as follows:
1. Considering performance as a system-wide constraint, we have developed an analytical model by employing Lagrange Multiplier [35] for our architecture to determine the optimal cache size.
2. The analytically determined optimal cache size is used for resizing LLC by following the three thermal efficient patterns.
• AltRow. Shuts down alternate rows of cache banks.
• Chess. Generates a chessboard like pattern in LLC resizing.
• OptTar. Cache banks closer to cores are assigned highest shutdown priority, with optimal management of future requests of the turned off cache banks.
Among all of these patterns, OptTar shows the maximum average temperature which is around 6◦C, with 40.3% savings in leakage energy, whereas AltRow and Chess save 26% and 26.5% leakage energy, respectively. The respective EDP gains for AltRow, Chess and OptTar are 11%, 11.5% and 18.7%.