Migration Policies - Reducing the Costly Write Operations

2.3 Reducing the Costly Write Operations

2.3.1 Migration Policies

Chapter 2. Background 37

Write Reduction Policies

For HCA For NVM caches

Block Migration Policies Region based Prediction Policies Cache Partitioning

Policies Bypass Policies

Reconfiguration Policies

Figure 2.10: Classification of write reduction techniques based on the type of caches

Chapter 2. Background 38 The initial placement of the block into these regions is based upon the type of access that causes a miss. If it is a read miss, the incoming block is placed in the (read) STT region. Otherwise, for a write miss, the incoming block is placed in the SRAM (write) region. Also, in this policy, with each cache block, a 2-bit saturating counter is added that counts the region-wise access. Any disproportion of read or write accesses in any of these region results in the migration of the block from one region to another. The policy shows the 55% power reduction and 5%

IPC improvement over the baseline SRAM and STT caches, respectively.

Li et al. [58] proposed a micro-architectural mechanism for the different write patterns of hybrid LLC. In this approach, the bank of the Hybrid LLC is either made up of SRAM or of STT-RAM. The block placement here is as same as the RWHCA. All the store miss blocks are placed in the target SRAM bank near to the requesting core, and all the load miss blocks are placed in the private STT-RAM bank of a core. Here, the migration of the write-intensive line from STT-RAM to SRAM is initiated when two consecutive writes or there cumulative writes are entertained by the block. On the other hand, the migration of the block from SRAM to STT-RAM is performed by two ways: Active and Lazy, based upon the position of the migrated block. If the block is fetched from the lower level of memory, then the active migration is triggered only after a read hit to a block.

Otherwise, if the block is swapped from the STT-RAM, then the lazy migration is triggered after two read hits to the block.

In 2012, Chen et al. [59] proposed a static and dynamic approach that is based on the compiler hints for the block placement in the different regions of the hybrid cache. Here, any misinterpretation and misprediction in the access leads to the migration of block from one part to another in the hybrid cache. Guo et al. [60]

proposed a wear resistant hybrid cache block placement approach where initially the cache line is placed in the SRAM region. During the lifetime of cache block, based on the different cache accesses, the line is categorized into Dead on Load (DoL) and Write Intensive (WI) Line. When the line is evicted from the SRAM, all the lines other than the DoL and WI are migrated to the non-volatile region.

Chapter 2. Background 39 Later in 2014, an adaptive block placement and migration policy is proposed in [61]. In this work, they categorized the block based on type of writes: Prefetch Write, Demand Write, and Core Write. The placement of the block is based upon the access patterns and the type of the writes. Whereas, the migration decision is taken by using the predictor. In particular, a block fetched due to prefetch miss will be directly placed into the SRAM region. Upon eviction, the predictor is used to check whether the evicted block is dead or not. If not, the block is migrated to SRAM. In case of a core-write miss, the block is written directly to the main memory. On the other hand, in case of a hit in STT, the possibility of future write burst to the block is checked with the help of predictor, and the block is migrated accordingly to SRAM. In case of a demand miss, the predictor is accessed to check if the incoming block is dead or not. If it is, the bypass operation is performed.

Otherwise, the block is placed in the STT.

Wang et al. [62] propose a dynamic cache reallocation strategy to the different partitions of L1 based hybrid cache. Here, the cache blocks are transferred between the two regions by using the two mechanisms: Immediate Transfer and the Delayed Transfer. In immediate transfer, for the remote read operation to SRAM block, the block is transferred to STT-RAM, and for the remote write operation to STT- RAM block, the block is transferred to SRAM. Whereas, in delayed transfer, until two reads to SRAM block or two writes to STT block are entertained, the block is not transferred.

The other recent and the notable works over the past three years in the context of migration are reported in [63, 64]. In [63], the replacement policy of the cache is modified and it partitions the replacement stack into two regions: Reserved and Victim. The decision to place and migrate the cache lines from the different part of hybrid main memory into these region of replacement stack is based upon the Average Memory Access Time. Whereas, in [64], a priority based data migration to reduce the migration jitter for the frequently accessed data is given for the 3D based hybrid cache. In this approach, the data block is migrated between and in the layers with the priority in the X direction followed by the Y and then the Z direction. The policy reduces power consumption by 34.5%.

Chapter 2. Background 40 All the above block placement techniques are performing the migration of the block from one region/layer to another. The movement of the blocks between the regions/layers consumes extra energy as well as impacts the performance of the cache. As the applications running on the CMPs have variable behavior that can also change the block access behavior in the cache. For such cases, the massive migrations will nullify the benefits obtained by the existing techniques.

Dalam dokumen LiNoVo: Longevity Enhancement of Non-Volatile Caches by (Halaman 67-70)