Comparison with baseline architecture

4.4 Results and Analysis

4.4.1 Comparison with baseline architecture

Energy Delay Product (EDP)

Figure 4.6 shows the EDP savings (on Y-axis) for all the policies normalised over the baseline for the different benchmark programs, shown along the X-axis. In this case, energy dissipated for following components, such as-L2 cache, network and DRAM accesses are included. Energy consumed by other on-chip components like L1 cache, CPU cores etc. are not included in calculation of EDP or total energy consumption in this work. Applications having WSS much smaller than the available cache show significant improvement in the savings for B ON OFF ALL and B ON OFF OPT. But, for B OFF ONCE a few applications show less savings. These applications need more cache space initially, so lesser number of cache banks have been shutdown. As in B OFF ONCE, no cache bank shutdown will be allowed once IPC degradation is more than the given threshold. We get an average savings of 29%, 27% and 30% in EDP for the three policies over our baseline architecture.

Static energy savings

Static energy is the main component of the chip energy which we save here by tuning the cache size dynamically. Figure 4.7 shows the energy consumptions for different policies over the baseline architecture. We saved 66%, 59% and 65% in static energy consumption on an average for different policies over the baseline.

Shutting down of cache banks leads to no switching activity in those areas of the

Chapter 4. DiCeR 94

Figure 4.7: Normalised static energy consumption, with 4MB L2 cache.

chip which in-turn helps to control the temperature increase in these parts of the chip. Note that, this remap policy can be used to relocate cache requests from hotter part of a chip to cooler ones.

Execution Time

Dynamic cache tuning incurs data migration of valid blocks in the bank being shutdown and also requires forwarding of subsequent requests to the target bank.

Implication is that, the network traffic will be increased and also leads to slightly more cache misses at the target bank. These overheads lead to degradation in overall IPC. The average overhead incurred from migration is around 1.3% of the total execution time, which is negligible. Figure 4.8 shows the normalised IPC across the benchmarks for all the proposed policies over baseline. The respective IPC degradation for B ON OFF ALL, B OFF ONCE and B ON OFF OPT are 2.9%, 1.5% and 1.8% on average. The excessive cache tuning in B ON OFF ALL increases more number of idle clock cycles in the system and hence the performance degrades. But the controlled cache tuning of B ON OFF OPT reduces the performance degradation significantly than B ON OFF ALL. For B OFF ONCE, the cache size will be changed in a very restricted way, so performance degradation is lesser than other two policies, but this policy does not offer more energy savings compared to B ON OFF OPT. However, B ON OFF ALL enables better control on cache size and hence we get more EDP savings than B OFF ONCE.

Chapter 4. DiCeR 95

Figure 4.8: Normalised IPC value for different benchmark applications, with 4MB L2 cache.

Figure 4.9: Normalised network energy value for different benchmark applications, with 4MB L2 cache.

Network overhead

While shutting down the cache banks, its contents must be transferred to the target bank and all future requests must be forwarded to the target bank. Even while turning on a cache bank, all of its existing contents from the target bank must be brought back into this bank before resuming its normal operations. This increases the network traffic as is evident from the increased energy consumption. But the cache portions which are turned off or turned on are the least accessed portions, hence the amount of traffic transferred is not huge with respect to normal baseline communication. However, frequent cache tuning increases the network traffic. The result shown in Figure 4.9 implies that network energy has not been increased significantly. However, this small increment of around 1.23% is compensated by the reduction in static power, which is evident due to an overall 30% savings in EDP.

Additionally, Figure 4.10 shows the change in NoC latency for all the applications.

Chapter 4. DiCeR 96

Figure 4.10: Normalised NoC latency for different benchmark applications, with 4MB L2 cache.

Most of the cases, the increment in latency is not significant, however, highest increment is 4.7% for ferret4. On the other hand, for freq1, body1 and ferret1 latencies are decreased in some cases. The reduced NoC latencies in Figure 4.10 reflects the performance improvements in Figure 4.8. This decrement in latency happens due to lesser amount of remapped requests after shutting down of banks, i.e. number of requests to the turned off banks have been reduced during execution.

The other reason for reduced NoC latency could be due to the target banks being in close proximity of the accessing core compared to the original (victim) bank.

Note that, overall bank access patterns change with the policies, hence, there exists diversity in NoC latency for the same application.

Effect in Total Energy Consumption

Savings in static energy reduces the total energy consumption effectively. Fig- ure 4.11 shows reduction in total energy consumption with significant savings in static energy than baseline architecture. The proposed policy B ON OFF OPT, is compared in this figure with baseline architecture ( B and O are suffixed with the benchmarks’ names in this figure to represent Baseline and B ON OFF OPT policy). Slight increment in network energy, due to remapping, is compensated by significant EDP gains due to performance aware static energy savings. Dynamic energy (energy for cache accesses) is added with the network energy in this figure.

Chapter 4. DiCeR 97

Figure 4.11: Normalised total energy consumption with details breakdown of its components, for different benchmark applications, with 4MB L2 cache.

Dalam dokumen Energy and thermal management of CMPs by dynamic cache reconfiguration (Halaman 119-123)