Summary - Energy and thermal management of CMPs by dynamic cache reconfiguration

Chapter 2. Background 54

• From the available literature, it has been noticed that, large SRAM based modern LLCs can potentially increase chip temperature by generating hotspots at the large cache area, although in earlier designs caches are usually con- sidered as colder on-chip components.

• Reduction in cache temperature can only be done by applying either of these two classical techniques: (a) power gating or (b) regulating supply voltage (i.e. drowsy cache). Power gating reduces more temperature than the latter option as gating the power supply at cache area drastically reduces the power consumption.

• Most of the cache techniques developed earlier are attempting to reduce temperature by controlling the cache accesses, i.e. by reducing dynamic power.

But, modern thermal cache design also needs attention towards leakage reduction. Furthermore, most of these techniques are built with the larger sized transistors having a minimum channel length of 65nm, whereas recent designs have a channel length of 32nm or lesser.

• By considering the superposition and reciprocity theorem [33], it can hypo- thetically be stated that, turned off large cache portion can create on-chip thermal buffer which can significantly reduce chip temperature.

• Additionally, cache based policies impact the computation according to its sensitivity towards the dynamic WSS of the process, but can maintain a stable thermal profile.

Chapter 2. Background 55 which in-turn increases temperature and leakage power consumption [5, 4, 29]. Ac- cording to our survey and simulation analysis discussed so far, it can be concluded that, reduction in both leakage power and chip temperature are the supreme design concern for the architects. Especially, LLCs are the major contributors to the total on-chip power consumption, where LLC leakage dominates the other power components. Additionally, LLCs are evaluated as the comparatively colder on- chip components, but from the earlier discussion it can be stated that LLCs can also generate on-chip hotspots.

Reduction in effective LLC area can reduce its leakage consumption significantly, which can be done either by some off-line techniques or through some on-line techniques. The existing diversities in cache usage patterns of the modern applications in addition with multi-tasking environment motivate us to tune the cache size on-line in either direction; that is, provide ample amount of caches to the applications when it is required and shrink the cache size with reduction in WSS.

Reducing cache size through state destroying policy can aggressively reduce the leakage but may incur stall cycles which can degrade the system performance. To address these issues, we propose an on-line cache tuning policy that resizes the cache at bank level granularity based upon the locality of reference and system performance.

Power consumption in a semiconductor circuitry excogitates the thermal issues.

Hence, reduction in cache leakage can also help us further to reduce effective chip temperature. In the next part of the thesis, we therefore propose thermal aware cache tuning, which reduces the cache hotspots by turning them off. As leakage has a quadratic relationship with the temperature and by forming a circular dependency these cache portions can also increase the chip temperature, hence, it makes sense to turn-off the least used cache portions. The gated cache banks form on-chip thermal buffer which distributes the generated heat to the components in its close proximity by exploiting reciprocity and superposition theorem of heat transfer [33]. Thus the effective chip temperature gets reduced. We also design thermal aware cache resizing for a CCMP, where cache resizing follows some pre- designated patterns to reduce temperature. Both of these thermal aware designs

Chapter 2. Background 56 are performance cognizant and hence, they also turn-on cache banks whenever required.

Chapter 3 Simulation Framework

This chapter elaborates the simulation environment that we have used in our works. All the experiments reported in this thesis are done in a full-system simulation framework. Basically, full-system simulators are able to simulate the entire electronic systems including CMPs. The machine where simulation environment runs is called as the host machine, and the virtual system engineered by the simulator is called as target machine. Moreover, the full-system simulator provides CPU cores, along with multi-level private/shared caches, memory systems and I/O devices, which altogether produce a flavour of a real CMP. Additionally, these system components are connected through a standard NoC module that has also been integrated in this. The full-system simulator further allows to execute real programs through an Operating System (OS) platform installed independently in the target machine, unlike the instruction set simulators. The virtual device drivers of target machine also allows OS to execute all of its modules those run normally on a real hardware.

As simulators are a set of computer programs, hence, any program module developed for target machine’s architecture can be modified to meet any new design requirements. For example, a conventional cache designed in a full system simulator can be modified to support Dynamic Cache Reconfiguration at both way as well as bank levels. For design space exploration, we can also easily change some preliminary parameters, like cache associativity, cache size, number of banks etc.

Chapter 3. Simulation Framework 58 in target machines. A brief on computer architecture simulators are provided in the next sections with their importance in industrial and academic research.

Dalam dokumen Energy and thermal management of CMPs by dynamic cache reconfiguration (Halaman 80-84)