• Tidak ada hasil yang ditemukan

Chapter 2. Background 51

2.4.4 Miscellaneous Wear Leveling

This subsection reports all other techniques that improve the write endurance of non-volatile memories either by using the hybrid cache or by exploiting the write reduction strategies.

A simple address space randomization technique that does wear leveling by the movement of line to its neighboring location is reported in [106]. To aid this process, two registers: Start and Gap and extra memory space Gapline are used.

The Start register counts the number of times all the cache lines in the memory is relocated, and the Gap register counts the number of cache lines relocated in memory. A hybrid cache architecture based wear leveling technique: Ayush reported by Mittal et al. [107] migrates the write intensive data in the SRAM region of HCA. In this work, upon a write to the NVM region, the possibility of migration is checked by comparing the LRU age information of the NVM and the victim block in the SRAM. In case, if the SRAM contains an old data, the migration operation is performed. Sturkov et al. [108] presented an endure aware memory design that implements slower writes to reduce the stress on the NVM cell and to improve its lifetime. A ReRAM based NUCA architecture that does wear leveling in a performance conscious way by using the critical line predictor is reported by Kotra et al. [109]. Recently, an L1 cache based endurance aware data allocation strategy is proposed by Farbeh et al. [110]. The proposed work is motivated by the fact that the endurance of I cache is larger than the D cache in terms of number of writes. The strategy periodically makes alternate use of D cache as I cache and I cache as D cache.

Chapter 2. Background 52 a significant amount of on-chip area. Traditional caches made up of charge based memory technologies like SRAM/DRAM consume a lot of leakage power due to the continuous process scaling and fail to fulfill the application demands in terms of scalability. To mitigate this, computer architects have moved towards emerg- ing NVMs and look them as alternate memory technologies in the cache hierar- chy [111]. The gain obtained by using the NVMs is low leakage power consumption, high density, multi-bit storage capability and excellent scalability. However, by employing NVMs in the cache hierarchies it will suffer from costly write opera- tions and weak write endurance; thereby it will impact the performance, energy consumption and the lifetime of the caches.

Over the previous years, many attempts have been made to counter the costly write operations of the NVM cache by using the reconfiguration policies and by- pass techniques. Further, to reduce the expensive write operations in the NVMs, researchers use the best characteristics that each memory technology offers by the use of Hybrid Cache Architecture. In HCA, block placement is the most chal- lenging issue so as to place the appropriate block in the proper region. In the context of HCA, different efforts have been made for the block placement using the block migration schemes, region-based prediction techniques, and by using the cache partitioning strategies.

To endure the cache from the write variations exhibited due to concurrent ex- ecution of the multiple applications, researchers proposed different strategies at different granularities of the cache. In particular, the write variation inside the cache set are mitigated by using the intra-set wear leveling techniques. On the other hand, the write variation across the cache sets is alleviated by inter-set wear leveling techniques. Also, instead of concentrating on wear-leveling at the block- level, the architects cope up the non-uniform write distribution inside the content of block by using cell level wear-leveling. Further, along with the wear leveling techniques, the endurance of non-volatile cache is enhanced by the write reduction schemes and by using the HCA.

Chapter 2. Background 53 Thus, the careful management of the writes in the NVM technologies can make them a suitable candidate in the cache hierarchy for an efficient hardware system.

Chapter 3

Reducing Write Cost by Dataless Entries and Prediction

In this chapter, we discuss the first contribution to the longevity enhancement of non-volatile cache. We proposed a data allocation policy that reduces the number of writes and energy consumption of the STT-RAM region in the Hybrid last level cache by considering the existence of private blocks. In addition to this, we employed a predictor that helps to redirect the write-backs from L1 to SRAM region of the hybrid cache; depending on the predicted reuse distance aware write intensity. The proposed work is evaluated with two different existing techniques in case of the dual and quad-core system.

3.1 Introduction

As discussed in Chapter 2, the block placement policy in the HCA is considered to be a challenging critical issue. Previously several strategies in HCA have been proposed, but as per our knowledge, none of the existing literature exploit the existence of private blocks. Private blocks are those blocks that are requested by a single core, and the cache controller serves the request with exclusive permission (i.e., for both read and write operation). Whereas, the block requested by multiple

Chapter 3. Reducing Write Cost by Dataless Entries and Prediction 56 cores is served with the shared or read-only permission to the requesting cores. In case of private blocks, for most of the time, the blocks in the L2 (or LLC) cache contain stale data [112, 113] (The conclusive evidence is given in section 3.2.1). For such kind of blocks, the actual worthy data is updated when the block is written back from the L1. In other words, data in the L2 cache is not required or needed until the L1 cache performs the write-back operation. When the private block is loaded into the L2 cache, the data part of the block is not stored in the data array of the cache. To maintain such dataless entries in the L2 cache, we make some changes in the conventional MESI protocol by adding some new states and their associated transitions. Also, our policy uses a predictor to decide on whether or not the first write back from the owner L1 cache to the dataless entries in the non-volatile region should be redirected to the SRAM region. The decision of our predictor is based on the reuse distance aware write intensity of the block. Further, in this chapter, we also present replacement policies for the different regions of the hybrid cache.

In this work, our proposed policy makes use of private blocks and avoids storing its data part in the non-volatile region of the LLC. We use STT-RAM as a non- volatile region of hybrid cache, although the policy can be easily extended to use PCRAM or ReRAM based hybrid caches.

The main contributions of this work are as follows:

• We consider the existence of private blocks in the HCA and avoid storing its data part in the non-volatile region.

• To maintain dataless entries in the STT region of the hybrid cache, we make changes in the conventional MESI protocol. These changes include the addition of new states and their associated transitions to the existing protocol.

• We propose a predictor to decide whether or not the first write-back from the owner L1 cache to the dataless entries in the non-volatile region should be redirected to the SRAM region. The decision of our predictor is influenced by the reuse distance aware write intensity of the block.

Chapter 3. Reducing Write Cost by Dataless Entries and Prediction 57

STT-RAM SRAM

Loaded Block from Main Memory

Private ? Sec.3.3.1

Dataless Normal

Decision Logic Sec. 3.3.3 RDAWIP

(Predictor) Sec. 3.3.2

Write Backs from Upper Level Cache

Y N Replacement

Logic Sec. 3.3.4

Evicted Block to Main Memory

Figure 3.1: General overview of contribution in chapter 3

• We also propose a replacement policy for the different regions of the hybrid cache.

• The proposed techniques are evaluated against two existing techniques: Read Write aware Hybrid Cache Architecture [27, 28] and Write Intensity prediction technique [30]. Experimental results show the reduction of overall writes along with savings in energy.

Figure 3.1 presents the general overview of the proposed contributions in this chapter.

The rest of the chapter is organized as follows: Motivation and Background are presented in Section 3.2. Section 3.3 illustrates the proposed hybrid cache archi- tecture along with the concept of set sampling. Section 3.4 discussed the experi- mental methodology. Results and analysis are presented in Section 3.5. Finally, we summarize this chapter in the last section 3.6.

Chapter 3. Reducing Write Cost by Dataless Entries and Prediction 58

96%

97%

98%

99%

100%

Private Blocks Shared Blocks

Figure 3.2: Percentage of private and shared blocks brought from

the main memory on L2 miss.

0%

20%

40%

60%

80%

100%

Clean Dirty

Figure 3.3: Percentage of blocks having exclusive permission which are clean or dirty at the time of

replacement