• Tidak ada hasil yang ditemukan

Improving the Lifetime and the Endurance of Non-Volatile Cache . 46

Another hurdle with the employment of non-volatile cache is the weaker write endurance. In real-time execution environment, this lower endurance affects the

Chapter 2. Background 47

Write Leveling Policies

Miscellaneous

Intra-Set Inter-Set Cell Level

Figure 2.11: Classification on wear leveling techniques

lifetime of the non-volatile cache. Additionally, due to differing working set size and the run-time access pattern of the applications, the lifetime of non-volatile cache is affected by the write variations (categorized as an Inter and Intra-set write variation as reported in 2.2.2). To improve the lifetime and to mitigate the unwanted write variations, different kinds of wear leveling techniques have been proposed over the years. This section illustrates all such policies by categorizing them into many sub-levels. Figure 2.11 classifies these sub-levels in the wear leveling policies.

2.4.1 Intra-Set Wear Leveling Policies

The goal of intra-set wear leveling technique is to balance out the deranged writes inside the cache sets. Through this wear leveling, some of the cache blocks have been prevented to entertain more number of writes compared to another blocks inside the set.

In the year 2013, the first intra-set wear-leveling technique: Probabilistic set line flush (Polf ) was proposed by Wang et al. [29] in i2wap. The policy in- validates a cache block after the fixed number of writes determined by the Flush Threshold (FT). For this, a counter is used which is incremented after each write to a cache bank. The selection of a block for invalidation is based on the probabilis- tic method rather than some deterministic ways. In their work, most of the time, the technique chooses the hottest data (write-intensive data) inside the cache set.

Subsequently, the method flushes the hot data without changing its replacement information; thereby, the policy makes sure that the placement for the hot block on a subsequent miss will happen in another location in the cache set. To facilitate this process, the procedure uses two global counters and two registers.

Chapter 2. Background 48 Later, in 2014, EqualChance reported by Mittal et al. [31] added the counter:

numWrite with each cache set. The numWrite is incremented with each write in the cache set. Once the numWrite counter reaches the threshold Υ, on next write access to a block, the transfer/swap operation with an invalid/clean cache line in the cache set takes place and the counter numWrite is reset. In case, if there is no clean/invalid cache line present in the cache set, the normal write operation is performed at the same location of the cache line. A technique LastingNVCache presented by Mittal et al. [94] associates the 4-bit write counter with each block in the cache. The counter is used to maintain the number of writes entertained by the block in a single generation. Once the counter reaches a specified limit, the write operation is skipped by invalidating the block without updating the replacement information. Another technique WriteSmoothing proposed by the Mittal et al. [33] partitions the cache into multiple modules of equal number of cache sets. Here, in this work, the write variation inside each module is reduced by turning off the hot sub-ways by transferring their data to cold sub-ways within the module.

Other most recent and noteworthy works in intra-set wear leveling are reported in [32, 95, 96, 97]. In [95], a technique called ENVLIVE is illustrated where the small storage called HotStore made of SRAM is added to store the write-intensive block of the cache. Only those blocks that incur the specific number of writes (determined by the λ) will be eligible for the placement in the HotStore. The technique named EqualWrite reported in [96] allows write redirection and swapping of the block based on the difference in the write counts of the cache line. The Write-back Aware intra-set Displacement (WAD) approach proposed by Jokar et al. [32] in Sequoia employs the counter with each cache set that increment on each write to a set. Upon a saturation of a counter, on a next write hit to a cache set, the policy displaces the block of the set to the victim cache line location by invalidating the victim line back to the main memory. A hybrid random replacement policy is presented in [97] that periodically switches the replacement policy between the traditional replacement policy and the random replacement policy for shuffling the actively written lines in a cache set.

Chapter 2. Background 49 The above discussed intra-set wear-leveling techniques reduce the write variation inside the cache set and improve the lifetime. However, the extra counters associ- ated with these techniques will incur additional storage and area overheads to the system.

2.4.2 Inter-Set Wear Leveling Policies

The inter-set wear-leveling approaches aim to balance out the uneven write distri- bution across the cache sets. By this wear leveling, some of the cache sets will be prevented from getting worn out faster than the other cache sets inside the bank.

The first inter-set wear leveling technique: Swap Shift presented by Wang et al. [29] in i2wap changes the mapping of the cache sets after a fixed number of writes, determined by the Swap Threshold. In their work, the mapping of the set is adjusted by rotating the data inside the cache set. Chen et al. [98] presented an inter-set wear leveling technique that changes the mapping of the cache set at regular interval by performing an XOR operation between the content of the remap register and the set index of the block. Here, the content of the remap register is changed at the end of each interval of the application execution. A software controlled inter-set wear leveling approach that changes the cache color page mapping through write traffic in the cache is presented in [99, 100].

The other new works in the recent years in the inter-set wear leveling are reported in [32, 101]. In [32], a technique: Grouped Access Intra-Set Swapping is proposed in Sequoia that changes the cache set mapping between the heavily written and the lightly written set of the group with the help of the counters. In [101], Soltani et al.

proposed an approach that partitions the cache into multiple clusters. During the execution, the clusters change their mapping to counter the inter-set write varia- tion using the write intensity, mapping history, and the number of clean/invalid blocks.

The above-discussed approaches improve the lifetime and try to overcome the write variation across the cache set by changing the mappings of the sets. However,

Chapter 2. Background 50 the rearranging of the data according to the newly generated cache set mapping requires swaps or invalidation that consumes extra energy as well as impacts the performance.

2.4.3 Cell Level Wear Leveling

Apart from performing wear-leveling at the granularity of block, researchers have also tried to enhance wear-leveling at the memory cell level. This subsection discusses all such proposals.

Joo et al. [102] presented a technique to reduce the uneven write distribution in- side the cache block by using the bit line shifter. The shifter is used to spread out the writes over the whole PCM cell for the cache. To aid this process, two registers: Shift Offset Register (SOR) and Shift Interval Counter (SIC) are used to record how many data bits are shifted in a cache block and the number of writes performed to cache block to update the SOR. A frequent data encoding scheme that largely reduces the number of redundant writes is proposed in [103].

The proposed technique is motivated by the fact that the hamming distance be- tween the frequently generating codes is small. Thus, the data encoding scheme has the advantage to reduce the wear-out issue and improve the lifetime of the STT-based NVM cell. Another frequent pattern-based data encoding scheme to reduce the non-uniform write distribution is presented in [104]. In proposed work, the frequent data write pattern is categorized into two types: Deterministic and Non-deterministic. The frequent patterns are tracked by using dynamic profil- ing. During the application execution, these frequent data patterns get encoded, and their appropriate code bit is stored in the tag part of the STT-RAM cache.

Recently, a word-level write variation reduction scheme that explores the narrow width data of the word to reduce the unnecessary writes in the STT-RAM cache is reported in [105].

Chapter 2. Background 51

2.4.4 Miscellaneous Wear Leveling

This subsection reports all other techniques that improve the write endurance of non-volatile memories either by using the hybrid cache or by exploiting the write reduction strategies.

A simple address space randomization technique that does wear leveling by the movement of line to its neighboring location is reported in [106]. To aid this process, two registers: Start and Gap and extra memory space Gapline are used.

The Start register counts the number of times all the cache lines in the memory is relocated, and the Gap register counts the number of cache lines relocated in memory. A hybrid cache architecture based wear leveling technique: Ayush reported by Mittal et al. [107] migrates the write intensive data in the SRAM region of HCA. In this work, upon a write to the NVM region, the possibility of migration is checked by comparing the LRU age information of the NVM and the victim block in the SRAM. In case, if the SRAM contains an old data, the migration operation is performed. Sturkov et al. [108] presented an endure aware memory design that implements slower writes to reduce the stress on the NVM cell and to improve its lifetime. A ReRAM based NUCA architecture that does wear leveling in a performance conscious way by using the critical line predictor is reported by Kotra et al. [109]. Recently, an L1 cache based endurance aware data allocation strategy is proposed by Farbeh et al. [110]. The proposed work is motivated by the fact that the endurance of I cache is larger than the D cache in terms of number of writes. The strategy periodically makes alternate use of D cache as I cache and I cache as D cache.