Results and Analysis - LiNoVo: Longevity Enhancement of Non-Volatile Caches by

Out of the different configurations, MWWR examined on 16 MB 16-way set- associative L2 cache with I=5M cycles, M=128 and, m=4. In the later section, we analyze the effect of changing these values on the proposed approach. We have presented the results on the following set of metrics: coefficient of intra-set write variation (IntraV) (calculated with the help of eq. (2.2)), relative lifetime improvement (calculated with the help of eq. (2.3)), speedup, energy overhead, and invalidations.

5.4.1 Coefficient of Intra-Set Write Variation

Figure 5.4 shows the coefficient of intra-set write variation. Our proposed technique: MWWR reduces the intra-set write variation from 154.4% (STT), 114.6%

(Polf), 88.8% (Wsmooth) to 61.4% (MWWR). The reduction in intra-set write variation over STT is basically due to uniform-write distribution by MWWR inside each module. However, the further reduction in coefficient by MWWR against Polf and Wsmooth is due to two reasons: (i) Polf invalidates the data randomly without concerning its write behavior. (ii) The MWWR selects the m hot sub- ways for the write restriction in each interval, and with Wsmooth, the hot-sub-way are turned off only when the module’s intra-set write variation increases beyond

Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 139

70%

140%

210%

280%

350%

IntraV

STT PolF Wsmooth MWWR

Figure 5.4: Intra-Set write variation (lower is better)

0 2 4 6 8 10 12

Norm. Lifetime

PolF Wsmooth MWWR

Figure 5.5: Normalized lifetime with respect to STT (higher is better)

λ. Note that hot-sub-way(s) represent the sub-way(s) having maximum write count(s) among the other sub-ways.

5.4.2 Relative Lifetime Improvement

Figure 5.5 shows the normalized lifetime with respect to STT and is presented against the STT, Polf, and Wsmooth. Our proposed technique MWWR improves the lifetime by 4.25 times against the STT, 2.71 times against the Polf and, 1.63 times against the Wsmooth. These respective improvements are basically due to the reduction in the write variation coefficient values by the proposed technique MWWR.

Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 140

0.92 0.96 1 1.04 1.08 1.12

Norm. CPI

PolF Wsmooth MWWR

Figure 5.6: Normalized CPI with respect to STT (lower is better)

5.4.3 Effect on Performance

Figure 5.6 presents the normalized speedup (against the STT). MWWR maintains the same performance with the baseline STT. This is because MWWR evicts only LRU block from the other sub-ways, which in turn increase the miss rate only by 2.4%. The respective values in the increase of miss rate by PoLF and WSmooth are 12.8% and 29.3%. We observe 1% degradation in CPI with respect to STT by Polf due to increased allocations and evictions of MRU blocks in the cache.

WSmooth shows performance degradation for STT by 3.32%. This degradation is due to turning off the sub-ways in each module of the cache that in turn increase the miss rate, and the extra cycles taken for the write operations due to block transfer from hot sub-way to cold sub-way inside the module. However, there is no impact on the performance loss compared to baseline STT despite the fact that the lesser cache availability at each module is due to the eviction of the LRU blocks from the other sub-ways (other than the write-restricted sub-way). Note that there is a trade-off between the number of each sub-way available for the allocation at each module and the performance loss, which can be easily seen from the table 5.5 (at row 4 and 5).

Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 141

Energy Overhead

PolF Wsmooth MWWR 16%

Figure 5.7: Energy overhead with respect to STT (lower is better)

0 0.4 0.8 1.2 1.6 2

Norm. Invalidations

PolF Wsmooth MWWR 3.1

Figure 5.8: Normalized invalidation with respect to Wsmooth (lower is better)

5.4.4 Effect on Energy

Due to write redirections and the transfer of tags, MWWR consumes slightly more energy compared to baseline STT as shown in fig. 5.7. The energy overhead percentage against the baseline STT is merely 0.27%. However, with respect to PoLF and, Wsmooth, there is an improvement in the energy overhead in MWWR by 1.14% and, 3.3%. This gain in the energy is basically due to two reasons: (i) Polf invalidates MRU which in turn increases the allocations (writes) in the cache.

(ii) Wsmooth moves the block from the hot sub-way to the cold sub-way which incurs extra writes.

Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 142 Metric IntraV Lifetime EDP Loss

MWWR 84% 3.10 0.32%

Table 5.4: Comparison analysis between FAWLT and MWWR

5.4.5 FLASH based Adaptive Wear Leveling Technique (FAWLT) versus MWWR

As same as in the previous chapter, the comparative analysis is illustrated between FLASH based Adaptive Wear Leveling (FAWLT) and the proposed wear leveling:

MWWR. Table 5.4 list the analysis result between FAWLT [120] and MWWR. As can be seen from the table, the proposed approach works better than the FAWLT significantly, with a marginal increase in EDP.

5.4.6 Effect on Invalidation

Figure 5.8 shows normalized invalidations with respect to Wsmooth by the MWWR and Polf. MWWR reduces the invalidations by 45.3% and, 81.6% for Polf and, Wsmooth. Compared to Polf, the reduction is because of the selective invalidations by MWWR. The reduction compared to Wsmooth is mainly due to the difference between the write redirection policy of MWWR (transfer to any position of cache set) and block-transferring policy with Wsmooth (transfers from hot to one cold sub-way).

5.4.7 Storage Overhead

In our technique MWWR, we use A number of 10-bit Cij counters with each module of the cache to record the write accesses of each sub-way. Besides, 128 42-bit swap buffers are used to transfer the tags. Also, the listM odulewayList of size m∗M is composed by the entry of size log₂A. Thus, the percentage overhead implementation of MWWR compared to baseline STT-RAM is merely 0.02% for the selected values. On the other hand, the percentage savings in storage overhead

Chapter 5. Intra-Set Wear Leveling using Horizontal Partitions 143

Param. Norm.

Lft.

IntraV Base

IntraV MWWR

Norm.

EDP

Invalidations (k)

Reference 4.25 154.4% 61.4% 1.02 645k

I=2M 4.83 154.4% 57.6% 1.03 680k

I=10M 3.85 154.4% 64.5% 1.02 573k

m=8 4.94 154.4% 60.3% 1.04 786k

m=2 3.86 154.4% 67.4% 1.01 458k

M=256 4.68 154.4% 59.4% 1.03 658k

M=64 3.9 154.4% 62.8% 1.02 612k

8MB 4.92 119.3% 41.2% 1.02 751k

32MB 4.1 186.5% 85.7% 1.02 504k

A=8way 3.70 120.9% 57.4% 1.04 714k

A=32way 5.47 189.4% 79.7% 1.02 480k

Table 5.5: Comparative analysis for different parameters of L2 cache and algorithm (Lft.= lifetime, Base = baseline STT-RAM) ref = 16MB, 16 way, m

= 4, M = 128, I = 5M

of MWWR compared to Wsmooth is 81.1% (Note that the storage bits required for counters and swap buffer are taken from [33]).

Dalam dokumen LiNoVo: Longevity Enhancement of Non-Volatile Caches by (Halaman 168-173)