But large size LLCs account for their significant leakage energy consumption, which has a circular dependence on the effective temperature of the chip circuit. At the beginning of the next week, while buying fuel for his car, an idea occurred to the driver. the main fuel of Basket..must be curtailed now!!..no food supply to the idle.”.
Modern Chip Multi-Processors (CMPs)
Components in CMPs
Accessing a data block from the home L2 bank (i.e., the bank located with the same tile of the requester core) has less latency than the distant banks (located in some other tiles). Note that L2 in this figure is considered as the on-chip LLC similar to TCMP architecture.
Power Consumption in CMPs
Dynamic Power
Static Power
Ps=K1VDDT2e(αVDD+β)/T))+K2e(γVDD+δ) (1.3) Ps indicates the static current consumption due to subthreshold leakage of a CMOS circuit.
Short-Circuit Power
Thermal Issues in CMPs
Thermal Characteristics
Although power consumption can be changed instantaneously even in practical cases, temperature cannot. The temperature values shown along the Y-axis are in ◦C and the X-axis represents the timestamps during the run.
Mitigations-at a glance
The associated thermal capabilities of each chip element do not allow an instantaneous temperature change of the chip components. Because caches are typically colder than the hottest core region, and dynamically resizing large caches does not abruptly impact performance, with due attention to performance, some cache portions can be turned off to create thermal buffers to reduce chip temperature .
Cache based Mitigations
Leakage Minimisation
Controlling Temperature
Objectives of this work
Motivations
Reduce cache hotspots by independently disabling heavily used banks and least used banks to create a thermal buffer with a reasonable reduction in leakage. Create a (dynamically adjustable) optimal amount of on-chip thermal buffer by disabling cache banks based on their location and proximity to cores to gradually lower chip temperature.
Principal Contributions
- DiCeR at Cache Bank Level
- DiCeR in combination with DAM technique
- DiCeR for temperature control in TCMP
- DiCeR for temperature control in CCMP
In particular, the following variations are proposed: (1) B ON ALL. The system may decide to turn off or on some cash banks periodically throughout the process execution. Cash banks closer to cores get the highest shutdown priority, with optimal management of future requests from the shutdown cash banks.
Summary
Resizing is done by turning off/on several cache banks, which can be implemented by the power gate at the circuit level. In addition to bank-level granularity, we have also proposed mode-level cache sizing, where performance degradation is addressed by including DAM (Dynamic Associativity Management).
Organisation of Thesis
The cache size change decision is triggered based on the dynamic change in system performance. However, from a thermal efficiency perspective, these disabled cache parts are used as on-chip thermal buffers to reduce the effective chip temperature, especially in CMPs with larger LLCs.
Access Patterns of Shared LLCs
- Bank Level Granularity
- Set Level Granularity
- Dynamic Associativity Management (DAM)
- CMP-SVR
- Violation in Locality of Reference
It improves cache performance by reducing the number of conflict misses in the cache. The change in access patterns over long periods of time for a set of PARSEC applications is shown in Figure 2.5 for a TCMP architecture.
Cache Energy Modeling
Dynamic and Leakage Energy
Sense amplifiers' dynamic energy is Edyn-senseamps, while dynamic energy for reading bitlines is Edyn-read-bitlines. Eleak−req−net and Eleak−rep−net represent leakage energy consumption for the request and the reply networks, respectively.
Channel Length, Temperature and Leakage
Pd implies dynamic power and Ps is the leakage current consumption of the chip circuits. As seen earlier, LLCs consume a significant portion of the total power on the chip, and most of that comes from leakage. Therefore, in Figure 2.7 we simulated the contribution of an LLC to the total energy consumption of the chip for different temperatures.
Reducing Cache Leakage Consumption
Off-Line Techniques
47] generated cache miss equations to summarize the cache access behavior of the loop and its variables. Although this model predicts the cache misses for individual threads, it does not take the cache contention into account.
On-Line Techniques
In another exploration [82], a large portion of tag bits is moved from the cache to an external register (called the Tag Overflow Buffer), which acts as an identifier for the current location of the cache references. This policy shows a leakage saving of more than 40% for most of the tasks (the authors used).
Thermal Management in CMPs
Core Level Management
Task Migration and DVFS are the most promising and preferred options to reduce the core temperature. In most cases, task migration and DVFS can have a noticeable impact on system performance.
Cache Based Policies
This technique lowers core and cache (MRAM/SRAM) layer temperatures by more than 5°C while maintaining critical chip temperatures. Most of the cache techniques developed earlier try to reduce the temperature by controlling cache accesses, i.e.
Summary
Moreover, LLCs are evaluated as the relatively colder on-chip components, but from the previous discussion it can be stated that LLCs can also generate on-chip hotspots. In the next part of the thesis, we therefore propose thermal-aware cache tuning, which reduces cache hotspots by disabling them.
Computer Architecture Simulators
- Simics
- GEMS: An Overview
- CACTI
- McPAT
- HotSpot
Each of the L1 caches has a sequencer associated with it to manage the requests from the corresponding core. For our last two contributions we need to simulate the power and temperature of the entire chip.
Benchmarks
Parsec Benchmark Suite
Freqmine is used for Frequent Itemset Mining (FIMI) [141] with an array-based version of the Frequent Pattern-growth method. This is used to price a portfolio of swaptions using Heath-Jarrow-Morton (HJM) [142].
Simulation Methodologies
- Used Benchmark Applications
- Executing Benchmarks
- Comparing Different CMP Architectures
- Our Architectural Models
Reaching ROI implies that the benchmark initialization is over and all the threads spawned. The rest of the processes, such as warm-up and run policies, are the same as multi-threaded benchmarks.
Introduction
In this regard, our policy dynamically shuts down or enables cache banks based on system performance and banks' usage statistics. B ON OFF ALL A performance-related dynamic cache tuning strategy adjusts the size of the L2 caches by disabling cache banks.
Proposed Energy Saving Policy
Book Keeping and Future Requests
Similarly, during the flashing process, the target bank of the flashing bank will hang and not handle any requests during the flashing process. Storage overhead for source bank tracking Target bank selection is decided at runtime based on cache bank bank usage statistics.
Constraints to maintain
Previous works, in which power saving was done by remapping techniques, used a remapping table at the L1 cache level and when any new bank is closed, the entries in all L1 caches must be updated. To see the effect of bank closure on IPC, we compare baseline IPC with the IPC of pure bank closure policy (where the number of closing banks is limited to 50% of the total number of banks), through a set of simulations.
Experimental Evaluation
Experimental Setup
Configuration details for this simulation setup are given in Table 4.3, which details the processor, memory, and network configurations. The PARSEC benchmark [6] was used as an application program to validate our proposed architecture, the details of which are also given in Chapter 3.
Results and Analysis
- Comparison with baseline architecture
- Comparison with BSP and Drowsy [1]
- Analysis of power savings by varying the IPC con- straintstraint
- Analysis of proposed policy on a larger cache
- Summary
Relative to baseline, B ON OFF OPT delivers 19% and 29% more static energy savings than Drowsy C1 and Drowsy C2, respectively. Because B ON OFF OPT performs the reconfiguration intermittently, it has more control over the cache size.
Conclusion
Policy B ON OFF OPT gives the best savings compared to B ON OFF ALL and B OFF ONCE. The performance degradation is also less than 2% especially in the case of B ON OFF OPT.
Introduction
The DiCeR with DAM 108 consumption can be reduced by using a combination of cache bank shutdown and shutdown method, where the banks with minimal usage are candidates for shutdown as in BSP, and in some moderately used banks some ways are disabled to prevent leakage. This helps restore the performance loss and also creates new opportunities to exit each set in some ways.
Memory Latency in DiCeR
Here hij is the number of hits at bank j, whose requests were generated at core i. DiCeR with DAM 112 During reconfiguration, system also needs a few clock cycles to move data from victim to target or vice versa, which is negligible if done a limited number of times.
Proposed Energy Saving Policy
BSP vs. BSP SVR: A Comparative Analysis
More EDP gain is achieved in BSP SVR over BSP due to lower and lower NoC energy consumption. The performance loss threshold is set at 3% for both BSP and BSP SVR while running the simulations.
DiCeR with DAM at Multiple Granularities
Managing associativity: After stopping the path, the effective associativity of the bank is reduced, increasing the number of missed conflicts. This is done because CMP-SVR helps increase associativity by using RT string partitioning.
Experimental Evaluation
Results and Analysis
Static Energy Savings The obtained savings in cache leak energy are shown in Figure 5.11 for different benchmarks. Effect on smaller cache size Figure 5.14 shows the EDP savings and Figure 5.15 shows the static power savings for a 2MB 8-way associative L2 cache.
Conclusion
Closing banks in this regard can be a promising option to achieve such goals, because closing (larger) banks creates (larger) thermal buffers on the chip with remarkably high leakage savings. Since reducing power consumption plays the crucial role in lowering the temperature, this work therefore uses DiCeR to create on-chip thermal buffers for lowering the average chip temperature without disrupting the computation.
Introduction
DiCeR Controls The temperature in a switched off bank TCMP 130 will be restored to some other cooler memory banks, called as the target bank. If the performance degrades more than a threshold value, then the disabled banks will be enabled.
Thermal issues: from LLC perspective
- LLC: Thermal Characteristics
- Thermal Management: Core vs Cache
- Modeling Tile Temperature in TCMP
- Target Bank Selection
- Reconfiguring the LLC
- Algorithmic Design
In both cases, powered down banks create thermal buffers, and will be able to reduce the chip temperature. If performance degradation is within a predefined limit (δ) and the hottest bank is not a target bank, the system will shutdown the hottest bank after migrating its contents to the target and remapping to shutdown bank's controller (line 11 to 15 ) make possible. ).
Experimental Analysis
- Simulation Setup
- Temporal effect of cache resizing
- EDP gain
- Effect on Tile and Chip Thermal Profile
- Varying Reconfiguration Interval
- Summary
We show the final change in NoC latency and NoC energy consumption for all our applications in Figures 6.7 and 6.8, respectively. Both policies have good leakage savings as shown in Figure 6.10 for all our applications.
Conclusion
The creation of thermal buffer together with the leakage effect reduction reduces the effective temperature of the tiles in our TCMP model. In such cases, the cooling rate is very slow for these banks even when they are turned off.
Introduction
In the case of caches, heavily used blocks can create cache hot spots, while the least used parts of the cache unnecessarily increase leakage consumption, contributing to chip temperature. Additionally, in the case of some modern applications, cache access patterns do not long-term match the classic property of cache access (theLocality of Reference).
Background
CCMP and its Leakage Hungry LLC
The basic architecture used in this work (shown in Figure 7.1) contains 16 homogeneous CPU cores, which are placed along the edge of the chip. For our basic architecture (Figure 7.1), the total power consumption of the chip can be divided into two main components (not counting the interconnection and I/O interface): (I) the power consumed by the individual cores, including processing power and current consumed by L1 caches (data and instruction); and (II) power consumed by the LLC (L2 in our case).
Thermal Potential of the Centralised LLCs
Independent of cache accesses, the power drain that has a circular dependence on temperature can be reduced by closing the memory banks. Therefore, reducing the drain power of the LLC by dynamically resizing it can be a promising option to reduce the chip temperature without affecting the computing units as we have already seen earlier.
Runtime Cache Behaviour
Preliminaries and Analytical Problem For- mulationmulation
- Core Temperature Modeling
- Problem Formulation
- Performance Modeling with Cache Size
- Thermal Model
- Combined Analytics
- Finding out Optimal b
- Patterns for Cache Resizing
On the other hand, drastically reducing the cache size limits performance by incurring more cache misses. DiCeR in a CCMP for Thermal Efficiency 172 depends on the number of cache misses, the available number of cache banks, off-chip access delay, and delay incurred for redirected requests to target banks.
DiCeR for Thermal Efficiency
Algorithms and Discussions
The clusters along the periphery have one on and three closed banks in each cluster. The bank that is ON in a group becomes the target for OFF banks in that group.